[Israel.pm] Encoding Question

ynon perek ynonperek at gmail.com
Fri Oct 12 03:55:57 PDT 2012


Hi,

(here's the long story)

Printing the string yields the correct result, problem is afterwards.

I used this code inside a Dancer route handler, now when I just printed out
the string to a file or screen everything worked great.

But, when I returned it to the browser, I got the wrong encoding.
Moreover, if I wrote it into a file, and then used 'send_file' method to
send the file, everything was OK (correct encoding).

So that got me thinking it's a Dancer issue, which led me to sawyer. He
explained that  Dancer tries to detect the encoding of strings, and if it's
not UTF-8 it will encode it to utf-8.
He suggested I tried to decode my string before returning it to Dancer,
which worked very well.

We ended up wondering why Dancer failed to detect my string was already
utf-8 encoded.
I got the string from a MongoDB query, and then used lib::XML to create a
sitemap with it.

I tried to reproduce, but found that if I declare the string in my perl
code everything works, so it's probably related to the MongoDB query
(perhaps mongo returns just the bytes, so it wasn't marked as utf-8 and
then Dancer failed to detect that it was already encoded).

Around this step I was happy to have a working sitemap.xml for my website (
mobileweb.ynonperek.com/sitemap.xml) and moved on :)

Cheers,
  Ynon




On 12 October 2012 09:10, Gaal Yahas <gaal at forum2.org> wrote:

> Hold on. The string you already had, the dump of which you gave us, was
> already okay, or close enough to it. What happens if you tried just
> printing it (not with Data::Dumper)?
>
> I'm asking because I don't see any UTF-8 specifically, I just see a bunch
> of code points. The string is "הצגת-מפ", which you can easily see by
> looking up some characters in a Unicode table. You didn't show us any
> evidence of UTF-8 overencoding; if there was some, we'd be seeing the
> values 0xd7 0x94 etc. (the UTF-8 encoding of the abstract code point
> U+05d4).
>
> I think it's Dumper that was escaping things because it wasn't sure your
> terminal could display them or whatever. Just try "print $buf".
>
>
> On Fri, Oct 12, 2012 at 12:40 AM, ynon perek <ynonperek at gmail.com> wrote:
>
>> Hi All,
>> Thanks for all the help.
>>
>> Problem was in fact the opposite - double encoding (turned out both
>> lib::XML and Dancer encode to utf-8...)
>>
>> I ended up using decode('utf-8') on the data before passing it on, and
>> this solved the issue (so now I have encode -> decode -> encode chain...
>> which is why abstractions are evil).
>>
>> Have a great weekend,
>>   Ynon
>>
>>
>> On 11 October 2012 18:49, Meir Guttman <meir at guttman.co.il> wrote:
>>
>>> Hey Gaal,****
>>>
>>> I would look up Data::Dumper::AutoEncode (
>>> http://search.cpan.org/~bayashi/Data-Dumper-AutoEncode-0.102/lib/Data/Dumper/AutoEncode.pm).
>>> You can then use ‘eDumper’ rather than Dumper to actually see letters. This
>>> package also enables you to use any encoding you want. (The default though
>>> in utf8.)****
>>>
>>> Meir****
>>>
>>> ** **
>>>
>>> *From:* perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] *On
>>> Behalf Of *Gaal Yahas
>>> *Sent:* יום ה 11 אוקטובר 2012 17:03
>>> *To:* Perl in Israel
>>> *Subject:* Re: [Israel.pm] Encoding Question****
>>>
>>> ** **
>>>
>>> U+05d4 is HEBREW LETTER HE etc. -- your buffer is already in Unicode.***
>>> *
>>>
>>> On Thu, Oct 11, 2012 at 4:51 PM, ynon perek <ynonperek at gmail.com> wrote:
>>> ****
>>>
>>> Hi All,****
>>>
>>> ** **
>>>
>>> Quick encoding question: I have  a text string that I think is in
>>> cp1255, because when I print it with Data::Dumper I get:****
>>>
>>> ** **
>>>
>>> \x{5d4}\x{5e6}\x{5d2}\x{5ea}-\x{5de}\x{5e4}****
>>>
>>>
>>> ****
>>>
>>> But, when I try to decode it using:****
>>>
>>> ** **
>>>
>>> my $decoded = decode('CP1255', $text);****
>>>
>>> ** **
>>>
>>> I get this error:****
>>>
>>> ** **
>>>
>>> Wide character in subroutine entry at /Users/ynonperek/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/darwin-2level/Encode.pm line 174, <DATA> line 16.****
>>>
>>> Ideas ?****
>>>
>>> ** **
>>>
>>> -- ****
>>>
>>>
>>> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד לדבר<http://publicspeakr.blogspot.com/> כתוב
>>> במיוחד בשבילך.****
>>>
>>> ** **
>>>
>>>
>>> _______________________________________________
>>> Perl mailing list
>>> Perl at perl.org.il
>>> http://mail.perl.org.il/mailman/listinfo/perl****
>>>
>>>
>>>
>>> ****
>>>
>>> ** **
>>>
>>> --
>>> Gaal Yahas <gaal at forum2.org>
>>> http://gaal.livejournal.com/****
>>>
>>> _______________________________________________
>>> Perl mailing list
>>> Perl at perl.org.il
>>> http://mail.perl.org.il/mailman/listinfo/perl
>>>
>>
>>
>>
>> --
>>
>> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד לדבר<http://publicspeakr.blogspot.com/>כתוב במיוחד בשבילך.
>>
>>
>> _______________________________________________
>> Perl mailing list
>> Perl at perl.org.il
>> http://mail.perl.org.il/mailman/listinfo/perl
>>
>
>
>
> --
> Gaal Yahas <gaal at forum2.org>
> http://gaal.livejournal.com/
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://mail.perl.org.il/mailman/listinfo/perl
>



-- 

כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד
לדבר<http://publicspeakr.blogspot.com/>כתוב במיוחד בשבילך.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.perl.org.il/pipermail/perl/attachments/20121012/53a52d7c/attachment-0001.htm 


More information about the Perl mailing list