[Israel.pm] pdf2txt ps2txt

Offer Kaye offer.kaye at gmail.com
Thu Nov 4 01:39:18 PST 2004

On Thu, 4 Nov 2004 08:36:41 +0200 (IST), Shlomo Yona
<shlomo at cs.haifa.ac.il> wrote:
> I need to extract the Hebrew text (including the niqqud)
> from the PDF files, in order to further manipulate them.

No Perl solutions, I'm afraid, but:

1. Have you tried to look at the output of "strings"? Depending on
your locale and terminal abilities, it might actually generate
something worth looking at :-)

2. There is this project:
It might not preserve the nikud, but since it converts to XML (or
HTML), it might work, at least partially.

3. ps2html might work:

4. Scribus:
is an Open Source Desktop Publishing system for Linux. I included it
in the list of posssible tools because the site says that "Other
features include PDF Import, EPS import/export, Unicode text including
right to left scripts such as Arabic and Hebrew."
So it might be useful to you.

Good luck :-)
Offer Kaye

More information about the Perl mailing list