[Israel.pm] sorting unicode Hebrew characters

Yuval Kogman lists at woobling.org
Mon Apr 19 12:30:09 PDT 2004

On Mon, Apr 19, 2004 at 21:11:01 +0300, Shlomo Yona wrote:
> Hello,
> I have text which is encoded in UTF8.
> The text contains various unicode characters.
> Whenever I try to run sort on the text I get an ordering of
> the tokens in the text which is not the lexicongraphic
> ordering of Hebrew characters. 
> What I think is happening is that the sort doesn't see the
> unicode characters but instead it sees bytes and therefore
> sorts according to plain ASCII lexicongraphical order.

       locale - Perl pragma to use and avoid POSIX locales for built-in opera-

           @x = sort @y;       # ASCII sorting order
               use locale;
               @x = sort @y;   # Locale-defined sorting order
           @x = sort @y;       # ASCII sorting order again

See perldoc locale and perldoc perllocale, and set your LANG environment
variable, methinks. IIRC there's something like the 'he_IL.UTF8' locale
on most GNU operating systems.

Alternatively you can probably 'use utf8' for literals if your perl is
old, and make sure the data you insert is (using perlio layers, etc)
indeed UTF8 strings in perl. A recent perl should work as you expect,
unless you 'use bytes'.


