[Israel.pm] Upcomming meeting: Hebrew and Perl

Shlomo Yona shlomo at cs.haifa.ac.il
Wed Jun 16 00:16:33 PDT 2004


On July 1st, I'll give a 80 minute lecture about Hebrew and

I'd be happy to receive requests for topics or problems
that I can address during the lecture.

Here are some issues I'd like to cover:

*1* What is a "character set"?
*2* What is "encoding"?
*3* What exactly do the following mean and what is the
differences among them:
	* cp-1255
	* iso-8859-8
	* utf-8
*4* logical Hebrew vs. visual Hebrew
*5* Processing Hebrew with niqqud:
	* how to generate?
	* how to tokenize?
	* display issues
	* editing issues
	* unicode issues
*6* transformations using the following pragma/modules:
	* utf8
	* Encode
*7* How does the locale of the system influence processing?
*8* Tokenizing Hebrew text:
	* what is a sentence?
	* what is a word?
*9* When your HTML/CGI says one thing and the
web-server/web-client say otherwise:
	* writing Hebrew friendly HTML
	* why doesn't my browser recognize my Hebrew web page?
	* open issues...
*10* Editing code and files in Hebrew using vim
	* cp1255
	* utf8
	* how to tell Perl that my code contains Hebrew and stay alive?

What I think I'll probably do is this:

I'll definitely go over:
	#1, #2, #3 and #4. I'll also briefly explain #7
so we have common ground to talk further.

Going over #6 will be useful using examples. This will give
us a set of idioms that we can later copy&paste into our
code... :-)

I'd like to cover tokenization issues: #8. This is something
most people think to be trivial but in fact isn't. The
amount of thought and craft put into tokenization stages
impacts greatly the success rates of many applications which
require tokenization as a preprocessing stage. I'll talk
about some of the dificulties, and then present home made
code which tries to demonstrate how to do it.

Now, I'm not sure how much interest, if at all we have going
over #5. This is a heavy topic! It will probably be very
useful for those who need it, but at the same time very
boring for those who don't. Usually, we don't. So -- I'll
probably skip this and concentrate only on
undotted/unvoweled Hebrew script.

I wonder about #9: this is more related to web programming
and has to do with HTML, HTTP and how web servers and web
clients interpret them. Diving into this depends on the
audience. What do you think?

I also wonder about #10: it is not really about Perl but
more about configuring your development environment -- so I
think this is really borderline. I won't go into this unless
there is real demand from a majority in the audience.

Any reuests?
Any comments?

Shlomo Yona
shlomo at cs.haifa.ac.il

More information about the Perl mailing list