[Israel.pm] pdf2txt ps2txt
Shlomo Yona
shlomo at cs.haifa.ac.il
Wed Nov 3 22:36:41 PST 2004
Hello,
I have a few dozens of PDF files containing Hebrew texts
with niqqud and images. These are actually the issues of
sha'ar lamatxil (see:
http://www.slamathil.co.il/defaultHeb.htm).
I need to extract the Hebrew text (including the niqqud)
from the PDF files, in order to further manipulate them.
I've tried pdf2ps and then ps2ascii (these are utilities I
found on my Mandrake 9.1) but though the pdf2ps produced a
valid postscript file that looks like the original PDF file,
the second step was a complete failure, as it produced a
small file with blanks and a few control characters.
Can you suggest a method for extracting the texts (with the
niqqud) from the PDF files?
Thanks.
--
Shlomo Yona
shlomo at cs.haifa.ac.il
http://cs.haifa.ac.il/~shlomo/
More information about the Perl
mailing list