[Israel.pm] sorting unicode Hebrew characters

Shlomo Yona shlomo at cs.haifa.ac.il
Mon Apr 19 11:11:01 PDT 2004


I have text which is encoded in UTF8.
The text contains various unicode characters.
Whenever I try to run sort on the text I get an ordering of
the tokens in the text which is not the lexicongraphic
ordering of Hebrew characters. 

What I think is happening is that the sort doesn't see the
unicode characters but instead it sees bytes and therefore
sorts according to plain ASCII lexicongraphical order.

Can anyone suggest a code snippet which actually can sort
Hebrew text in UTF8 encoded texts?

Of course, it is possible that I'm misunderstanding
something in the way Perl interprets my code and the input
text, so any insights about how things work, will also help.

I've already been to the perldoc perlunicode and to perldoc
-f sort, so referrals to these two documents will probably
not help me unless it is accopmanied with some text
explaining what I was missing.


Shlomo Yona
shlomo at cs.haifa.ac.il

More information about the Perl mailing list