[Israel.pm] utf-8 hebrew

Omer Zak omerz at actcom.co.il
Sun May 2 02:43:32 PDT 2004


On Sun, 2 May 2004, Gabor Szabo wrote:

> Hey Unicode wizzards here is a question to you:
>
> I got two strings:
>
> "\x{5db}\x{5dc}\x{5d1}";
> "\x{d7}\x{9b}\x{d7}\x{9c}\x{d7}\x{91}";
>
> The first I got by reading a file using utf8 and
> The second I got from a browser via a CGI script.
>
> They are both supposed to be the same word.
> I got the above representation by Dumping their variables
> using Data::Dumper.

The first string is in UCS-2 encoding (each character is encoded in 16
bits; Unicode characters beyond U+FFFF are encoded using surrogate pairs
[this is only my guess, as there is no such a thing in the example
string]).

The second string is in UTF-8 encoding.  But somehow you are using 16 bits
to represent each character?
                                             --- Omer
My opinions, as expressed in this E-mail message, are mine alone.
They do not represent the official policy of any organization with which
I may be affiliated in any way.
WARNING TO SPAMMERS:  at http://www.zak.co.il/spamwarning.html




More information about the Perl mailing list