[Israel.pm] use utf8 and hebrew from file

Meir Guttman meir at guttman.co.il
Fri Dec 28 02:12:49 PST 2012

The now-accepted usage of ‘open’ is to use the three-argument form of it.
And there you specify the file’s encoding:

      open my $fh, "<:encoding(utf8)", $file_name;

You can also use a pragma to specify that ALL files open statements will
use a given encoding by default as in:

        use open ':encoding(iso-8859-1)';

But it should be used carefully, see
https://www.socialtext.net/perl5/the_utf8_perlio_layer . The writer strongly
suggests to use it together with ‘ :encoding(utf8)’ for extra validation of
the input.

Also, if you emit Unicode characters to a file handle, say STDOUT, you are
going to see warnings such as “Wide Character in print”. This is because the
handle expects just one-Byte characters (e.g., Latin-1). To avoid this you
can use the binmode operator on filehandles such as:

binmode(STDOUT, ":utf8");

There are millions of books, articles, blogs and FAQs about perl and Unicode
lurking on the net. One not too shabby is
http://perlgeek.de/en/article/encodings-and-unicode and the links there.


Happy Perling…



From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il] On Behalf
Of sawyer x
Sent: יום ו 28 דצמבר 2012 10:20
To: Perl in Israel
Subject: Re: [Israel.pm] use utf8 and hebrew from file


"use utf8;" basically means "I have UTF8 characters in my source code. This
is useful if, for example, you define a string in the script with Hebrew
characters. You're basically telling perl to read your source code file as
UTF8 characters.

On Fri, Dec 28, 2012 at 2:52 AM, Shmuel Fomberg <shmuelfomberg at gmail.com>

Hi Moshe.


It is not clear to me what is 'get them', and 'move it out'.

But generally, when processing utf8 file, use ":utf8" in the open command.
declaring "use utf8;" won't have any effect on reading your files.




On Fri, Dec 28, 2012 at 7:28 AM, moshe nahmias <moshegrey at ubuntu.com> wrote:

I was trying to get some strings in Hebrew from a file (the file is utf8, at
least as far as I know since I changed it with iconv) but wasn't able to get
them while use utf8 was in effect, when I tryed to move it out it suddenly
worked like a charm.
isn't it supposed to be the other way around? why utf8 is the problem maker
in this case?


Perl mailing list
Perl at perl.org.il


Perl mailing list
Perl at perl.org.il


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.perl.org.il/pipermail/perl/attachments/20121228/8a31b482/attachment-0001.htm 

More information about the Perl mailing list