[Israel.pm] utf-8 hebrew

Omer Zak omerz at actcom.co.il
Sun May 2 02:43:32 PDT 2004

On Sun, 2 May 2004, Gabor Szabo wrote:

> Hey Unicode wizzards here is a question to you:
> I got two strings:
> "\x{5db}\x{5dc}\x{5d1}";
> "\x{d7}\x{9b}\x{d7}\x{9c}\x{d7}\x{91}";
> The first I got by reading a file using utf8 and
> The second I got from a browser via a CGI script.
> They are both supposed to be the same word.
> I got the above representation by Dumping their variables
> using Data::Dumper.

The first string is in UCS-2 encoding (each character is encoded in 16
bits; Unicode characters beyond U+FFFF are encoded using surrogate pairs
[this is only my guess, as there is no such a thing in the example

The second string is in UTF-8 encoding.  But somehow you are using 16 bits
to represent each character?
