[Israel.pm] pdf2txt ps2txt

Shlomo Yona shlomo at cs.haifa.ac.il
Wed Nov 3 22:36:41 PST 2004


I have a few dozens of PDF files containing Hebrew texts
with niqqud and images. These are actually the issues of 
sha'ar lamatxil (see:

I need to extract the Hebrew text (including the niqqud)
from the PDF files, in order to further manipulate them.

I've tried pdf2ps and then ps2ascii (these are utilities I
found on my Mandrake 9.1) but though the pdf2ps produced a
valid postscript file that looks like the original PDF file,
the second step was a complete failure, as it produced a
small file with blanks and a few control characters.

Can you suggest a method for extracting the texts (with the
niqqud) from the PDF files?


Shlomo Yona
shlomo at cs.haifa.ac.il

More information about the Perl mailing list