[Israel.pm] Encoding Question

ynon perek ynonperek at gmail.com
Fri Oct 12 04:41:55 PDT 2012


In MongoDB it's on by default

https://metacpan.org/module/FRIEDO/MongoDB-0.46.1/lib/MongoDB/BSON.pm#___pod

(which just makes the whole thing even stranger)


On 12 October 2012 13:20, Meir Guttman <meir at guttman.co.il> wrote:

> Hi Gaal,****
>
> I am sorry to say, but I am not familiar with MongoDB, only with MySQL.***
> *
>
> In MySQL you have to specify what encoding are you storing text, what
> encoding your current input is, etc., although one can specify the default
> encoding, usually UTF-8. In particular, when you use DBI and you create a
> “connection” to the DB, you must specify in the “connect attributes”, among
> other things, also to enable utf-8, like this:****
>
> *my %conn_attrs = (RaiseError  => 1,*
>
> *                  PrintError  => 0,*
>
> *                  AutoCommit  => 1,*
>
> *                  **mysql_enable_utf8  => 1**);*
>
> Discovering this was rather long, frustrating and took a lonk time!****
>
> My be there is a similar attribute in MongoDB?****
>
> I am afraid that this is the only help I can provide...****
>
> Meir****
>
> ** **
>
> *From:* perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] *On
> Behalf Of *ynon perek
> *Sent:* יום ו 12 אוקטובר 2012 12:56
>
> *To:* Perl in Israel
> *Subject:* Re: [Israel.pm] Encoding Question****
>
> ** **
>
> Hi,****
>
> ** **
>
> (here's the long story)****
>
> ** **
>
> Printing the string yields the correct result, problem is afterwards.****
>
> ** **
>
> I used this code inside a Dancer route handler, now when I just printed
> out the string to a file or screen everything worked great.****
>
> ** **
>
> But, when I returned it to the browser, I got the wrong encoding.****
>
> Moreover, if I wrote it into a file, and then used 'send_file' method to
> send the file, everything was OK (correct encoding).****
>
> ** **
>
> So that got me thinking it's a Dancer issue, which led me to sawyer. He
> explained that  Dancer tries to detect the encoding of strings, and if it's
> not UTF-8 it will encode it to utf-8.  ****
>
> He suggested I tried to decode my string before returning it to Dancer,
> which worked very well.****
>
> ** **
>
> We ended up wondering why Dancer failed to detect my string was already
> utf-8 encoded. ****
>
> I got the string from a MongoDB query, and then used lib::XML to create a
> sitemap with it. ****
>
> ** **
>
> I tried to reproduce, but found that if I declare the string in my perl
> code everything works, so it's probably related to the MongoDB query
> (perhaps mongo returns just the bytes, so it wasn't marked as utf-8 and
> then Dancer failed to detect that it was already encoded).****
>
> ** **
>
> Around this step I was happy to have a working sitemap.xml for my website (
> mobileweb.ynonperek.com/sitemap.xml) and moved on :)****
>
> ** **
>
> Cheers,****
>
>   Ynon****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> On 12 October 2012 09:10, Gaal Yahas <gaal at forum2.org> wrote:****
>
> Hold on. The string you already had, the dump of which you gave us, was
> already okay, or close enough to it. What happens if you tried just
> printing it (not with Data::Dumper)?
>
> I'm asking because I don't see any UTF-8 specifically, I just see a bunch
> of code points. The string is "הצגת-מפ", which you can easily see by
> looking up some characters in a Unicode table. You didn't show us any
> evidence of UTF-8 overencoding; if there was some, we'd be seeing the
> values 0xd7 0x94 etc. (the UTF-8 encoding of the abstract code point
> U+05d4).****
>
> ** **
>
> I think it's Dumper that was escaping things because it wasn't sure your
> terminal could display them or whatever. Just try "print $buf".****
>
> ** **
>
> ** **
>
> On Fri, Oct 12, 2012 at 12:40 AM, ynon perek <ynonperek at gmail.com> wrote:*
> ***
>
> Hi All,****
>
> Thanks for all the help. ****
>
> ** **
>
> Problem was in fact the opposite - double encoding (turned out both
> lib::XML and Dancer encode to utf-8...)****
>
> ** **
>
> I ended up using decode('utf-8') on the data before passing it on, and
> this solved the issue (so now I have encode -> decode -> encode chain...
> which is why abstractions are evil).****
>
> ** **
>
> Have a great weekend, ****
>
>   Ynon****
>
> ** **
>
> On 11 October 2012 18:49, Meir Guttman <meir at guttman.co.il> wrote:****
>
> Hey Gaal,****
>
> I would look up Data::Dumper::AutoEncode (
> http://search.cpan.org/~bayashi/Data-Dumper-AutoEncode-0.102/lib/Data/Dumper/AutoEncode.pm).
> You can then use ‘eDumper’ rather than Dumper to actually see letters. This
> package also enables you to use any encoding you want. (The default though
> in utf8.)****
>
> Meir****
>
>  ****
>
> *From:* perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] *On
> Behalf Of *Gaal Yahas
> *Sent:* יום ה 11 אוקטובר 2012 17:03
> *To:* Perl in Israel
> *Subject:* Re: [Israel.pm] Encoding Question****
>
>  ****
>
> U+05d4 is HEBREW LETTER HE etc. -- your buffer is already in Unicode.****
>
> On Thu, Oct 11, 2012 at 4:51 PM, ynon perek <ynonperek at gmail.com> wrote:**
> **
>
> Hi All,****
>
>  ****
>
> Quick encoding question: I have  a text string that I think is in cp1255,
> because when I print it with Data::Dumper I get:****
>
>  ****
>
> \x{5d4}\x{5e6}\x{5d2}\x{5ea}-\x{5de}\x{5e4}****
>
>
> ****
>
> But, when I try to decode it using:****
>
>  ****
>
> my $decoded = decode('CP1255', $text);****
>
>  ****
>
> I get this error:****
>
>  ****
>
> ** **
>
> Wide character in subroutine entry at /Users/ynonperek/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/darwin-2level/Encode.pm line 174, <DATA> line 16.****
>
> Ideas ?****
>
>  ****
>
> -- ****
>
>
> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד לדבר<http://publicspeakr.blogspot.com/> כתוב
> במיוחד בשבילך.****
>
>  ****
>
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl****
>
>
>
> ****
>
>  ****
>
> --
> Gaal Yahas <gaal at forum2.org>
> http://gaal.livejournal.com/****
>
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl****
>
>
>
> ****
>
> ** **
>
> -- ****
>
>
> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד לדבר<http://publicspeakr.blogspot.com/> כתוב
> במיוחד בשבילך.****
>
> ** **
>
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl****
>
>
>
> ****
>
> ** **
>
> --
> Gaal Yahas <gaal at forum2.org>
> http://gaal.livejournal.com/****
>
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl****
>
>
>
> ****
>
> ** **
>
> -- ****
>
>
> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד לדבר<http://publicspeakr.blogspot.com/> כתוב
> במיוחד בשבילך.****
>
> ** **
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl
>



-- 

כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד
לדבר<http://publicspeakr.blogspot.com/>כתוב במיוחד בשבילך.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.perl.org.il/pipermail/perl/attachments/20121012/6480fd60/attachment-0001.htm 


More information about the Perl mailing list