[Israel.pm] Unicode un-handling
Mikhael Goikhman
migo at homemail.com
Fri Apr 11 01:30:01 PDT 2008
On 11 Apr 2008 02:29:59 +0300, Shmuel Fomberg wrote:
>
> Shmuel Fomberg wrote:
>
> > I think that I'll backup my code, and go utf-8 bonnoza.
> > That means:
> > 1. Clone custom Template Toolkit driver for
> > CGI::Application::Plugin::AnyTemplate that handle utf8 templates
>
> I looked into this, and it seen that the whole chain isn't utf8-ready.
> CGI::Application::Plugin::AnyTemplate does not accept encoding
> parameter, and even if it did Template::Toolkit ignore the binmode
> parameter. So nothing really works.
Is there any practical unsolvable problem to always work with non utf-8
flagged data only (input from or output to file, socket, cgi, db, other
modules)? And whenever you need to operate on multibyte characters you
may write a function for each such case, for example "trim" or "cut" that
does "decode_utf8", then regexp or "substr", then "encode_utf8" back. And
if you like, your function may also support both cases (using _is_utf8)
and return the output in the same manner (with or without utf8 flag).
I do such things often depending on the data sources and modules used.
There are usually not many places to worry about conversion between utf8
flagged/non-flagged data, nothing unsolvable. And no, I don't think that
there is any principal problem in the current perl, because I often work
with real binary data or with text in another encoding too, not just
always with utf8 characters.
Regards,
Mikhael.
--
perl -e 'print+chr(64+hex)for+split//,d9b815c07f9b8d1e'
More information about the Perl
mailing list