[Israel.pm] utf-8 hebrew

Gabor Szabo gabor at perl.org.il
Sun May 2 04:29:39 PDT 2004


Hey Unicode wizzards here is a question to you:

I got two strings:

"\x{5db}\x{5dc}\x{5d1}";
"\x{d7}\x{9b}\x{d7}\x{9c}\x{d7}\x{91}";

The first I got by reading a file using utf8 and
The second I got from a browser via a CGI script.

They are both supposed to be the same word.
I got the above representation by Dumping their variables
using Data::Dumper.

When I tried to compare them (using regex and  index)
they seemed to be different.

when I applied
use Encode;
$x = decode("utf-8", STRING);
to the second string it became really equal to the first string
so I thought maybe the second is not really utf-8.

But when I checked the originals if the are utf8
using utf8::is_utf8(STRING)  they were both said to be utf8.
But then again it was at night...

So can someone explain me why did I get different representations
and what are these two representations ?

thanks
  Gabor



More information about the Perl mailing list