[Israel.pm] \w for utf8

Yuval Kogman nothingmuch at woobling.org
Mon Aug 20 05:44:59 PDT 2007


use utf8;

Will tell perl that the current file is encoded in utf8 and all
strings will be assumed to be that (as opposed to latin1).

Since your string is likely coming from elsewhere, look into
binmode($fh, ":utf8) and open($fh, "<:utf8", $file), and also
Encode::decode.

These are the common methods to get a string to be marked as unicode
in memory, at which point the regex engine treats \w+ as really all
alphanumerical characters, not only [a-zA-Z0-9_].

There is a tutorial by Juerd somewhere, it's supposed to be pretty
good. Try google perhaps

On Mon, Aug 20, 2007 at 15:39:58 +0300, Pinkhas Nisanov wrote:
> Hi,
> 
> I need catch string that may include 'utf8' characters:
> e.g.:
> 
>   my $str_utf8 = 'N-Größe';
>   my @res = ( $str_utf8 =~ /(\w+)/g );
>   print join( " ++ ", @res ), "\n";
> 
> 
> it prints:
> 
>  N ++ Gr ++ e
> 
> but I need:
> 
> N ++ Größe
> 
> 
> thanks
> Pinkhas Nisanov
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://perl.org.il/mailman/listinfo/perl

-- 
  Yuval Kogman <nothingmuch at woobling.org>
http://nothingmuch.woobling.org  0xEBD27418




More information about the Perl mailing list