[Israel.pm] Unicode un-handling

Mikhael Goikhman migo at homemail.com
Fri Apr 11 01:30:01 PDT 2008


On 11 Apr 2008 02:29:59 +0300, Shmuel Fomberg wrote:
> 
> Shmuel Fomberg wrote:
> 
> > I think that I'll backup my code, and go utf-8 bonnoza.
> > That means:
> > 1. Clone custom Template Toolkit driver for 
> > CGI::Application::Plugin::AnyTemplate that handle utf8 templates
> 
> I looked into this, and it seen that the whole chain isn't utf8-ready. 
> CGI::Application::Plugin::AnyTemplate does not accept encoding 
> parameter, and even if it did Template::Toolkit ignore the binmode 
> parameter. So nothing really works.

Is there any practical unsolvable problem to always work with non utf-8
flagged data only (input from or output to file, socket, cgi, db, other
modules)? And whenever you need to operate on multibyte characters you
may write a function for each such case, for example "trim" or "cut" that
does "decode_utf8", then regexp or "substr", then "encode_utf8" back. And
if you like, your function may also support both cases (using _is_utf8)
and return the output in the same manner (with or without utf8 flag).

I do such things often depending on the data sources and modules used.
There are usually not many places to worry about conversion between utf8
flagged/non-flagged data, nothing unsolvable. And no, I don't think that
there is any principal problem in the current perl, because I often work
with real binary data or with text in another encoding too, not just
always with utf8 characters.

Regards,
Mikhael.

-- 
perl -e 'print+chr(64+hex)for+split//,d9b815c07f9b8d1e'



More information about the Perl mailing list