[Israel.pm] Re: utf-8 and character semantics
roeyalmog at 013.net
Tue Jan 24 03:53:39 PST 2006
>>The database is utf-8. The manipulation, I must do includes isolating
>>a specific number of bytes (!) in a field.
can you give an example ?
anyway you can always change the encoding so you will end up if byte =
e.g. change utf-8 to Hebrew iso-8589-8 and handle it as characters that are
see Encode from_to
>>I know for sure that this number of bytes I use in a substr is
>>coordinated well to the utf-8 wide character boundaries.
substr actually uses characters that in utf-8 can be made of from\one or
more bytes each
so substr($atring,0,4) returns the first 4 chars and not necessarily first 4
to manipulate things at a lower level you can use vec
vec returns bits from a string and it ignores utf-8 so
vec($string,0,32) returns the first four bytes (=32 bit)
the return value is a number
so vec($string,0,8) returns number between 0-255
another option is to use pack/unpack with C (chars = bytes scheme):
$foo = pack("CCCC",65,66,67,68);
create an 4 bytes string "ABCD"
As one that was burn from utf-8 so many times do you really need byte
manipulation of utf-8 string ?????
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il]On Behalf Of
Sent: Wednesday, January 18, 2006 7:09 PM
To: Perl in Israel; Jerusalem pm
Subject: [Israel.pm] Re: utf-8 and character semantics
I have the following problem:
I select records from a table using dbi. I would like to manipulate
the selected data in the memory, and write it back.
So far no problem.
The database is utf-8. The manipulation, I must do includes isolating
a specific number of bytes (!) in a field.
I know for sure that this number of bytes I use in a substr is
coordinated well to the utf-8 wide character boundaries.
The question is: how can I make sure that substr will relate to number
of bytes and not to number of characters? Is use bytes(); enough? Can
I force 8 bit ascii retrieve from the db?
If I consistently use 8 bit ascii when reading and writing back (if
it's possible) I shouldn't destroy anything? Any ideas?
"Computer Science is no more about computers than astronomy is about
telescopes." (Edsger Wybe Dijkstra)
Perl mailing list
Perl at perl.org.il
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.14.20/233 - Release Date: 1/18/2006
More information about the Perl