[Israel.pm] utf-8 hebrew

Zohar Kelrich lumi at musicgenome.com
Sun May 2 09:24:04 PDT 2004


On Sun, 2 May 2004 18:13:25 -0200 (GMT+2)
Gabor Szabo <gabor at perl.org.il> wrote:

> 
> On Sun, 2 May 2004, Omer Zak wrote:
> >
> > Check how the charset encoding was specified.
> > The HTML standard allows you to specify the charset encoding used by a Web
> > page.
> 
> Both the HTTP header and the HTML header sais utf-8 for the page the
> server sends to the browser. The question why does the browser send
> back this (broken ?)  encoding and not something else ?

Short answer first:
The string you got IS utf-8 encoded. Perl just doesn't know about it. I guess
using Encode is the proper way to tell it.
(I looked here: http://www.cl.cam.ac.uk/~mgk25/unicode.html
)

Long answer next (I hope my ascii art table shows right):

This is your first character, hebrew kaf
0000 0101 1101 1011

Chars in the range U-00000080 - U-000007FF are mapped to this bit
sequence: 110xxxxx 10xxxxxx 

Here displayed in clever ascii table aligned with bits above

0000 0101 1101 1011     05 DB
110   101 11     D7
10          01 1011     9B

I've checked this carefully by hand and it's right, so you have honest to
goodness utf-8. Either the browser doesn't report the type in its headers, and
I don't know whether it's supposed to, really, or CGI (or whatever you're
using) isn't plucking this datum, and if it's there I really suppose it ought
to.


> Actually my immediate problem is solved as I know I just have to encode
> to utf-8 every string I receive from the client. Still the I'd like to
> actually understand what am I doing :)
> 
> Gabor

You're doing the Right Thing! Hope that helps :)

  Zohar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://perl.org.il/pipermail/perl/attachments/20040502/bb6522b8/attachment.pgp 


More information about the Perl mailing list