[Israel.pm] PDF handling

Yossi Itzkovich Yossi.Itzkovich at ecitele.com
Wed Dec 24 07:28:36 PST 2008

I was asked by a colleague here the following question:

I have a PDF document that contains tables.
Is there a Perl module that provides an API for reading a PDF doc,
Identifying the tables in the doc and reading each table-cell separately, even if the text in the cell
Is "broken" to several rows (wrapping)?

I tried to convert the PDF to text and work on the text, but the converted text doesn't always
Behave as expected, e.g. a cell with a "broken" line is converted to several text lines that
Are NOT NECESSARILY ADJACENT - there might be an blank row between the parts, which makes
The analysis more difficult.

Can someone help in this issue ?


