[Israel.pm] Unicode un-handling

Shmuel Fomberg semuelf at 012.net.il
Fri Apr 11 05:10:15 PDT 2008


Mikhael Goikhman Wrote:

> Is there any practical unsolvable problem to always work with non utf-8
> flagged data only (input from or output to file, socket, cgi, db, other
> modules)? And whenever you need to operate on multibyte characters you
> may write a function for each such case, for example "trim" or "cut" that
> does "decode_utf8", then regexp or "substr", then "encode_utf8" back. And
> if you like, your function may also support both cases (using _is_utf8)
> and return the output in the same manner (with or without utf8 flag).

Well, that's how it works right now. I'm just worried that Template 
Toolkit will get confuse handling utf8 data as latin1 data. But that's a 
very unlikely.

Don't forget that doing it this way will introduce weird characters 
everywhere. Theoretically, a Hebrew char can be 0x5D + 0x10. and then 
suddenly you have \r in you stream and weird things happens.

And now we have the question whether all the modules can handle 
weird/control chars in the text, or just go all the way and treat it as 
binary.

I'll go test a few modules...

Shmuel.



More information about the Perl mailing list