[Israel.pm] utf-8 hebrew
Zohar Kelrich
lumi at musicgenome.com
Sun May 2 09:24:04 PDT 2004
On Sun, 2 May 2004 18:13:25 -0200 (GMT+2)
Gabor Szabo <gabor at perl.org.il> wrote:
>
> On Sun, 2 May 2004, Omer Zak wrote:
> >
> > Check how the charset encoding was specified.
> > The HTML standard allows you to specify the charset encoding used by a Web
> > page.
>
> Both the HTTP header and the HTML header sais utf-8 for the page the
> server sends to the browser. The question why does the browser send
> back this (broken ?) encoding and not something else ?
Short answer first:
The string you got IS utf-8 encoded. Perl just doesn't know about it. I guess
using Encode is the proper way to tell it.
(I looked here: http://www.cl.cam.ac.uk/~mgk25/unicode.html
)
Long answer next (I hope my ascii art table shows right):
This is your first character, hebrew kaf
0000 0101 1101 1011
Chars in the range U-00000080 - U-000007FF are mapped to this bit
sequence: 110xxxxx 10xxxxxx
Here displayed in clever ascii table aligned with bits above
0000 0101 1101 1011 05 DB
110 101 11 D7
10 01 1011 9B
I've checked this carefully by hand and it's right, so you have honest to
goodness utf-8. Either the browser doesn't report the type in its headers, and
I don't know whether it's supposed to, really, or CGI (or whatever you're
using) isn't plucking this datum, and if it's there I really suppose it ought
to.
> Actually my immediate problem is solved as I know I just have to encode
> to utf-8 every string I receive from the client. Still the I'd like to
> actually understand what am I doing :)
>
> Gabor
You're doing the Right Thing! Hope that helps :)
Zohar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://perl.org.il/pipermail/perl/attachments/20040502/bb6522b8/attachment.pgp
More information about the Perl
mailing list