[Israel.pm] Re: utf-8 and character semantics

Roey Almog roeyalmog at 013.net
Tue Jan 24 03:53:39 PST 2006


Eitan Hi,

>>The database is utf-8. The manipulation, I must do includes isolating
>>a specific number of bytes (!) in a field.

can you give an example ?

anyway you can always change the encoding so you will end up if byte =
character scheme
e.g. change utf-8 to Hebrew iso-8589-8 and handle it as characters that are
actually bytes

see Encode from_to

>>I know for sure that this number of bytes I use in a substr is
>>coordinated well to the utf-8 wide character boundaries.

substr actually uses characters that in utf-8 can be made of from\one or
more bytes each

so substr($atring,0,4) returns the first 4 chars and not necessarily first 4
bytes

to manipulate things at a lower level you can use vec
vec returns  bits from a string and it ignores utf-8 so
vec($string,0,32) returns the first four bytes (=32 bit)
the return value is a number
so vec($string,0,8) returns number between 0-255

another option is to use pack/unpack with C (chars = bytes scheme):
$foo = pack("CCCC",65,66,67,68);
create an 4 bytes string "ABCD"

As one that was burn from utf-8 so many times do you really need byte
manipulation of utf-8 string ?????

Regards

Roey

-----Original Message-----
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il]On Behalf Of
Eitan Schuler
Sent: Wednesday, January 18, 2006 7:09 PM
To: Perl in Israel; Jerusalem pm
Subject: [Israel.pm] Re: utf-8 and character semantics


Dear All,
I have the following problem:
I select records from a table using dbi. I would like to manipulate
the selected data in the memory, and write it back.
So far no problem.
The database is utf-8. The manipulation, I must do includes isolating
a specific number of bytes (!) in a field.
I know for sure that this number of bytes I use in a substr is
coordinated well to the utf-8 wide character boundaries.
The question is: how can I make sure that substr will relate to number
of bytes and not to number of characters? Is use bytes(); enough? Can
I force 8 bit ascii retrieve from the db?
If I consistently use 8 bit ascii when reading and writing back (if
it's possible)  I shouldn't destroy anything? Any ideas?

--

Thank you
Eitan

--
"Computer Science is no more about computers than astronomy is about
telescopes." (Edsger Wybe Dijkstra)

_______________________________________________
Perl mailing list
Perl at perl.org.il
http://perl.org.il/mailman/listinfo/perl
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.14.20/233 - Release Date: 1/18/2006




More information about the Perl mailing list