[Israel.pm] unicode characters in your code
Mikhael Goikhman
migo at homemail.com
Sat Mar 13 16:05:52 PST 2004
On 14 Mar 2004 00:39:32 +0200, Yosef Meller wrote:
>
> Mikhael Goikhman wrote:
> |
> | It is not clear from your question what is "unicode character".
> |
> | Do you mean that you want $str = "binary_data"; to be interpreted by Perl
> | as utf8 string? I think (please someone correct me), Perl code itself is
> | considered ascii, so this is not possible, you should either use \x{...}
> | notation or read the unicode data from stdin/file using utf8 encoding.
>
> If you put "use utf8;" at the beginning of your code you can write it
> (or any string in it) directly as utf8. But I haven't tried it yet.
> Check out perldoc utf8.
Taking a look at utf8.pm from 5.8.0 and 5.6.0, I see you are right.
I just tried it, it works as documented. Here is test.pl, 3 lines:
$str = qq(Pure hebrew text in utf8);
@chars = split(//, $str);
print $chars[0], "\n";
Now, different perl version runs produce the following:
% env LANG=POSIX perl5.8.1 test.pl
(1 byte printed)
% env LANG=POSIX perl5.8.1 -Mutf8 test.pl
Wide character in print at aaa line 5.
(2 bytes printed)
% env LANG=he_IL.utf8 perl5.8.1 -Mutf8 test.pl
Wide character in print at aaa line 5.
(2 bytes printed)
% env LANG=he_IL.utf8 perl5.8.1 -Mutf8 -CO test.pl
(2 bytes printed) # no warning
% env LANG=he_IL.utf8 perl5.8.1 test.pl
(1 byte printed)
% env LANG=he_IL.utf8 perl5.8.0 test.pl
(2 bytes printed) # see the difference?
% env LANG=he_IL.utf8 perl5.6.0 test.pl
(1 byte printed)
Regards,
Mikhael.
--
perl -e 'print+chr(64+hex)for+split//,d9b815c07f9b8d1e'
More information about the Perl
mailing list