[Israel.pm] Unicode un-handling
semuelf at 012.net.il
Fri Apr 11 05:10:15 PDT 2008
Mikhael Goikhman Wrote:
> Is there any practical unsolvable problem to always work with non utf-8
> flagged data only (input from or output to file, socket, cgi, db, other
> modules)? And whenever you need to operate on multibyte characters you
> may write a function for each such case, for example "trim" or "cut" that
> does "decode_utf8", then regexp or "substr", then "encode_utf8" back. And
> if you like, your function may also support both cases (using _is_utf8)
> and return the output in the same manner (with or without utf8 flag).
Well, that's how it works right now. I'm just worried that Template
Toolkit will get confuse handling utf8 data as latin1 data. But that's a
Don't forget that doing it this way will introduce weird characters
everywhere. Theoretically, a Hebrew char can be 0x5D + 0x10. and then
suddenly you have \r in you stream and weird things happens.
And now we have the question whether all the modules can handle
weird/control chars in the text, or just go all the way and treat it as
I'll go test a few modules...
More information about the Perl