[Israel.pm] character semantics

Anatoly Vorobey mellon at pobox.com
Mon Aug 30 06:12:00 PDT 2004


> Where are the actual characters being listed?
> I mean -- how can I know which characters make up
> Punctuation?

The definitive source is the Unicode standard.

The definitive file is 
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt ,
the latest version of the character database. The third field 
(semicolon-separated) in every line is the
category the character belongs to. So if you want Dash Punctuation, your
category is 'Pd'; find all lines with 'Pd' in the third field and that's 
it. All punctuation would be all character with P or P? in the category
field, and so on. perldoc perlunicode already gives you the character 
codes.

If you want an in-depth look at various categories, their meanings and 
aliases, start with http://www.unicode.org/Public/UNIDATA/UCD.html ;
it refers to many .txt files in the same directory.

Note that your perl's understanding of those classes depends on the Perl 
version and may not match exactly the last released Unicode standard 
(it'll match *some* version of the standard current at the time the Perl 
version was released).

-- 
avva
"There's nothing simply good, nor ill alone" -- John Donne




More information about the Perl mailing list