[Israel.pm] character semantics
mellon at pobox.com
Mon Aug 30 06:12:00 PDT 2004
> Where are the actual characters being listed?
> I mean -- how can I know which characters make up
The definitive source is the Unicode standard.
The definitive file is
the latest version of the character database. The third field
(semicolon-separated) in every line is the
category the character belongs to. So if you want Dash Punctuation, your
category is 'Pd'; find all lines with 'Pd' in the third field and that's
it. All punctuation would be all character with P or P? in the category
field, and so on. perldoc perlunicode already gives you the character
If you want an in-depth look at various categories, their meanings and
aliases, start with http://www.unicode.org/Public/UNIDATA/UCD.html ;
it refers to many .txt files in the same directory.
Note that your perl's understanding of those classes depends on the Perl
version and may not match exactly the last released Unicode standard
(it'll match *some* version of the standard current at the time the Perl
version was released).
"There's nothing simply good, nor ill alone" -- John Donne
More information about the Perl