[Israel.pm] unicode characters in your code

Mikhael Goikhman migo at homemail.com
Sat Mar 13 16:05:52 PST 2004


On 14 Mar 2004 00:39:32 +0200, Yosef Meller wrote:
> 
> Mikhael Goikhman wrote:
> |
> | It is not clear from your question what is "unicode character".
> |
> | Do you mean that you want $str = "binary_data"; to be interpreted by Perl
> | as utf8 string? I think (please someone correct me), Perl code itself is
> | considered ascii, so this is not possible, you should either use \x{...}
> | notation or read the unicode data from stdin/file using utf8 encoding.
> 
> If you put "use utf8;" at the beginning of your code you can write it
> (or any string in it) directly as utf8. But I haven't tried it yet.
> Check out perldoc utf8.

Taking a look at utf8.pm from 5.8.0 and 5.6.0, I see you are right.
I just tried it, it works as documented. Here is test.pl, 3 lines:

  $str = qq(Pure hebrew text in utf8);
  @chars = split(//, $str);
  print $chars[0], "\n";

Now, different perl version runs produce the following:

  % env LANG=POSIX perl5.8.1 test.pl
  (1 byte printed)

  % env LANG=POSIX perl5.8.1 -Mutf8 test.pl
  Wide character in print at aaa line 5.
  (2 bytes printed)

  % env LANG=he_IL.utf8 perl5.8.1 -Mutf8 test.pl
  Wide character in print at aaa line 5.
  (2 bytes printed)

  % env LANG=he_IL.utf8 perl5.8.1 -Mutf8 -CO test.pl
  (2 bytes printed)         # no warning

  % env LANG=he_IL.utf8 perl5.8.1 test.pl
  (1 byte printed)

  % env LANG=he_IL.utf8 perl5.8.0 test.pl
  (2 bytes printed)         # see the difference?

  % env LANG=he_IL.utf8 perl5.6.0 test.pl
  (1 byte printed)

Regards,
Mikhael.

-- 
perl -e 'print+chr(64+hex)for+split//,d9b815c07f9b8d1e'



More information about the Perl mailing list