[Israel.pm] unicode characters in your code

Shlomo Yona shlomo at cs.haifa.ac.il
Sat Mar 13 23:07:54 PST 2004

I'm confused.

I want to take a file I have in XML format which contains
data in Hebrew encoded in utf8 and I want to sort it.

It seems that the sort at the commendline (regardless of the
local I set, hmmm... that's weird) fails to do it properly,
and it also seems that Perl misinterprets the bytes and
doesn't recognize them to be unicode characters in utf8.

Can someone please suggest a code snippet to sort a file
which looks like this? Of course, I only wish to sort
according to the text data (and not also according to the
element/attribute data).

	<elem>some Hebrew text</elem>
	<elem>some more Hebrew text</elem>
	<elem>yet some more Hebrew text</elem>
	<elem>yet even more Hebrew text</elem>
	<elem>and finally, some more Hebrew text</elem>

(of course, I cannot send utf8 encoded data to the mailing
list... so I represented it in english/ascii text).


I wasn't able to find the right way to do it so far... I'm
using XML::Twig for the xml processing, if that matters...

Shlomo Yona
shlomo at cs.haifa.ac.il

