[Israel.pm] unicode characters in your code
Shlomo Yona
shlomo at cs.haifa.ac.il
Sat Mar 13 23:07:54 PST 2004
I'm confused.
I want to take a file I have in XML format which contains
data in Hebrew encoded in utf8 and I want to sort it.
It seems that the sort at the commendline (regardless of the
local I set, hmmm... that's weird) fails to do it properly,
and it also seems that Perl misinterprets the bytes and
doesn't recognize them to be unicode characters in utf8.
Can someone please suggest a code snippet to sort a file
which looks like this? Of course, I only wish to sort
according to the text data (and not also according to the
element/attribute data).
<elem>some Hebrew text</elem>
<elem>some more Hebrew text</elem>
<elem>yet some more Hebrew text</elem>
<elem>yet even more Hebrew text</elem>
<elem>and finally, some more Hebrew text</elem>
(of course, I cannot send utf8 encoded data to the mailing
list... so I represented it in english/ascii text).
Thanks.
I wasn't able to find the right way to do it so far... I'm
using XML::Twig for the xml processing, if that matters...
--
Shlomo Yona
shlomo at cs.haifa.ac.il
http://cs.haifa.ac.il/~shlomo/
More information about the Perl
mailing list