[Israel.pm] PDF handling
Yossi.Itzkovich at ecitele.com
Wed Dec 24 07:28:36 PST 2008
I was asked by a colleague here the following question:
I have a PDF document that contains tables.
Is there a Perl module that provides an API for reading a PDF doc,
Identifying the tables in the doc and reading each table-cell separately, even if the text in the cell
Is "broken" to several rows (wrapping)?
I tried to convert the PDF to text and work on the text, but the converted text doesn't always
Behave as expected, e.g. a cell with a "broken" line is converted to several text lines that
Are NOT NECESSARILY ADJACENT - there might be an blank row between the parts, which makes
The analysis more difficult.
Can someone help in this issue ?
More information about the Perl