#1 2014-11-03 17:59

jcren22
Member
Registered: 2014-11-03
Posts: 1

Search content of PDF file to locate and read dates.

I have a couple thousand receipts that were scanned into PDF files and need to rename them based on the following:
Store Name + Date + Total Sale.

I have been unable to find a method to search PDF file content for "/  /" to locate then extract the date then manipulate it to
a YYYY-MM-DD format.

I would assume searching content for a "$" to locate and extract the total sale would be similar.


Any guidance would be appreciated.

Offline

#2 2014-11-06 03:09

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,479

Re: Search content of PDF file to locate and read dates.

There is a script (for PascalScript rule) to extract various tags from PDF files:
http://www.den4b.com/wiki/ReNamer:Scripts:Xpdf

There is also "pdftotext" tool in the same Xpdf package whcih can extract plain text from PDF files.

Offline

Board footer

Powered by FluxBB