You are not logged in.
To help you with your homework...
Pascal Script documentation:
http://www.den4b.com/wiki/ReNamer:Pascal_Script
Function reference:
http://www.den4b.com/wiki/ReNamer:Pasca … :Functions
Example scripts:
http://www.den4b.com/wiki/ReNamer:Scripts
Offline
Thanks for the resources Dennis. Are there any examples of how to insert the content (the PDF tag, in this case) as a 'prefix' rather than a suffix? I thought maybe the "move_to" command from here http://www.den4b.com/wiki/ReNamer:Scrip … me_portion, but I don't know... This also makes me think of a feature idea, but I'll put that on a separate thread...
Offline
FileName := WideExtractBaseName(FileName) + ' ' + Matches[0] + WideExtractFileExt(FileName);
↓ ↓ ↓ ↓ ↓
new filename base part of old filename space metatag extension (last part) of old filename
This is just simple logic and common sense. All you need to do is move the metatag and space before the filename if you want to prepend rather than append it.
Of course if you're writing WideExtractBaseName(FileName) + WideExtractFileExt(FileName), then you might as well shorten it to simply FileName.
Offline
Hmm.. Yes, I guess that was a no-brainer! Thanks Andrew
EDIT:
Just an extra note here, in case anyone else finds this thread via search, as I did.
My whole purpose was to rename a large number of ebooks that are in pdf format and give them a consistent naming structure. It turns out that most of the pdfs are mission the metadata needed.
Also, the metadata tag is likely to have "invalid characters" that will get added to the file name.
I looked up which common characters are invalid for file/folder names in Windows. Here they are as a list that you can paste (and save) as a custom Transliteration in list that changes them all to hyphens:
#=-
%=-
&=-
{=-
}=-
\=-
<=-
>=-
*=-
?=-
/=-
$=-
!=-
'=-
"=-
:=-
+=-
`=-
==-
It's interesting to note that the @ symbol and the space character " " were on the list. I use both of those in file and folder names though, so I left them off of this list.
EDIT AGAIN: Actually I'm going to post this list as a separate thread... It just occurred to me that maybe tilde "~" should be there...
Last edited by kunkel321 (2016-05-04 20:49)
Offline
Hi,
pdfinfo.exe default encoding is Latin1, which doesn't support Chinese characters. If Meta-data contains Chinese words, it will not display.
It should display:
ReNamer_Pro_7.1>pdfinfo.exe 3.pdf
Title: 标题abc1
Subject: 主题abc
Keywords: 关键字abc2
But it actually display:
ReNamer_Pro_7.1>pdfinfo.exe 3.pdf
Title: abc1
Subject: abc
Keywords: abc2
Try running this, it works well:
pdfinfo.exe -enc UTF-8 1.pdf
If you want to support Chinese characters, Pascal script should be modified to:
const
EXE = 'pdfinfo.exe -enc UTF-8';
TAG = 'Title\s*\:\s*(.*?)[\r\n]';
var
Command, Output: String;
Matches: TWideStringArray;
begin
Command := EXE+' "'+FilePath+'"';
if ExecConsoleApp(Command, Output) = 0 then
begin
Matches := SubMatchesRegEx(Output, TAG, False);
if Length(Matches) > 0 then
FileName := Matches[0] + WideExtractFileExt(FileName);
end;
end.
Last edited by jeffli (2019-09-07 02:30)
Jeff
Offline
Try running this, it works well:
pdfinfo.exe -enc UTF-8 1.pdf
Thanks for pointing this out. The script on the wiki has been updated accordingly.
Offline
jeffli wrote:Try running this, it works well:
pdfinfo.exe -enc UTF-8 1.pdf
Thanks for pointing this out. The script on the wiki has been updated accordingly.
Hi,
You help us a lot, I'd really appreciate it and give positive feedback.
Jeff
Offline
and there are no open source libraries for Delphi to parse PDFs
Oh, is ReNamer written in Delphi? In RAD Studio? Doesn't it encounter any utf issues?
Offline
If have tested this now on my own, here are my explanations...
Extract meta data from PDF file.
You can use the [Insert Meta Tag]-button to insert metadata from files, like ":File_DateCreated:".
But there is no Meta-Data extraction on PDFs on default.
Read den4bs' post above: http://www.den4b.com/forum/viewtopic.php?id=349
- The problem is that extracting tags from PDF is no easy task,
you'll nearly need to write an entire PDF parser to get that information.
- Anyway, there is possibility of using a 3-rd party executable tool to extract the tags from PDF,
- and then with a help of PascalScript, use them within ReNamer.
- - -
The package that we need to extract PDF meta data is called "Xpdf".
The Xpdf open source project includes a PDF viewer along with a collection
of command line tools which perform various functions on PDF files.
Xpdf was first released in 1995. It was written, and is still developed, by Derek Noonburg.
Download and extract it.
>> Browse to http://www.xpdfreader.com/ (((was before www.foolabs.com/xpdf)))
>> CLICK "Download the open source Xpdf tools"
>> CLICK "Download the Xpdf command line tools:" > Windows 32/64-bit: download
You will get a file called like: "xpdf-tools-win-4.02.zip"
Extract that ZIP file.
- - -
It has a command line tool called "pdfinfo.exe"
which we will use to print information from an PDF file.
Copy the 32-bit version of "pdfinfo.exe" and place it into ReNamer's folder.
--- ...\xpdf-tools-win-4.02\bin32\pdfinfo.exe
You may also want to read the documentation:
--- ...\xpdf-tools-win-4.02\doc\pdfinfo.txt
(((or read it online: http://www.xpdfreader.com/pdfinfo-man.html)))
- - -
Put the "pdfinfo.exe" in the same folder with the "ReNamer.exe".
Copy a sample PDF file into this folder too:
pdfinfo.exe
pdfinfo.txt
PDFTEST.pdf
ReNamer.exe
ReNamer.ini
Test it out:
Open a command prompt window. (((cmd.exe, Win+R, type cmd, press Enter)))
Navigate to the ReNamer folder.
In the command prompt window, enter the following command:
pdfinfo.exe PDFTEST.pdf
View the output in the command prompt.
You can also save that output to a text file:
pdfinfo.exe PDFTEST.pdf >PDFTESToutput.txt
Example Outputs (for reference):
Please note: not every tag may have an value, some are just empty.
The date format may depend on your system setting in windowsTM.
English date format:
Title: PDFTEST.pdf
Author: name removed
Creator: PScript5.dll Version 6.0.1
Producer: Acrobat Distiller 9.1.6 (Windows)
CreationDate: 04/06/17 19:46:57
ModDate: 04/06/17 19:46:57
Tagged: yes
Form: none
Pages: 3
Encrypted: no
Page size: 2384 x 3370 pts (A0)
File size: 17569259 bytes
Optimized: yes
PDF version: 1.6
German date format (note the missing leading zero on date less than 10)
Title: VBScript FileSystemObject
Author: name removed
Creator: PDFCreator Version 0.8.1
Producer: AFPL Ghostscript 8.51
CreationDate: Mon Oct 16 11:58:34 2006
ModDate: Mon Oct 9 10:23:45 2007
Tagged: no
Form: none
Pages: 57
Encrypted: no
Page size: 612 x 792 pts (letter) (rotated 0 degrees)
File size: 313778 bytes
Optimized: no
PDF version: 1.3
TIP:
Google for "Experts Exchange 6. Run the PDFinfo utility on the sample PDF file"
for to see an example output of PDF metadata.
- - -
Next we use a PascalScript to execute the "pdfinfo.exe" and read the output,
just like we had done above manually in command window.
Pseudo code:
-- myEXE = 'pdfinfo.exe -enc UTF-8';
(pdfinfo.exe default encoding is Latin1, which doesn't support Chinese characters.)
-- myCommand := myEXE+' "'+FilePath+'"';
-- ExecConsoleApp(myCommand, strOutput)
-- ShowMessage(strOutput); //should show something like the "Example Outputs:" above.
-- TAG = 'Title\s*\:\s*(.*?)[\r\n]';
-- Utilize Regular Expressions with 'TAG' to get the wanted line,
and next use PascalScript functions to process the found line to a nice format.
(see http://www.den4b.com/wiki/ReNamer:Pasca … :Functions )
Modify the TAG constant to specify which tag (line) you want to extract
and utilize PascalScript functions to process that finding to the wanted format (if not already).
### ### ### ### ### ### ### ###
Working code to get the "TITLE"-line from PDF meta data:
From den4bs' post above: http://www.den4b.com/forum/viewtopic.php?id=349
//Author: Denis Kozlov. Date: 2013-04-01.
const
EXE = 'pdfinfo.exe -enc UTF-8';
//Find a line in the output, starting with "Title" and ending at the EOL sequence.
TAG = 'Title\s*\:\s*(.*?)[\r\n]';
var
Command, Output: String;
Matches: TWideStringArray;
begin
Command := EXE+' "'+FilePath+'"';
if ExecConsoleApp(Command, Output) = 0 then
begin
Matches := SubMatchesRegEx(Output, TAG, False);
if Length(Matches) > 0 then
FileName := Matches[0] + WideExtractFileExt(FileName);
end;
end.
The example script just replaces the current name, leaving only the original extension untouched.
To append the meta tag to the end of the file name, find the following line:
FileName := Matches[0] + WideExtractFileExt(FileName);
And replace it with:
FileName := WideExtractBaseName(FileName) + ' ' + Matches[0] + WideExtractFileExt(FileName);
The same script of den4b as just before, but with an "ELSE" if nothing is found:
//Author: Denis Kozlov. Date: 2013-04-01.
const
EXE = 'pdfinfo.exe -enc UTF-8';
TAG = 'Title\s*\:\s*(.*?)[\r\n]';
var
Command, Output: String;
Matches: TWideStringArray;
begin
Command := EXE+' "'+FilePath+'"';
if ExecConsoleApp(Command, Output) = 0 then
begin
Matches := SubMatchesRegEx(Output, TAG, False);
if Length(Matches) > 0 then
FileName := Matches[0] + WideExtractFileExt(FileName)
else
FileName := '__No_Matches__' + FileName;
end;
end.
### ### ### ### ### ### ### ###
Working code to get the "CreationDate"-line from PDF meta data:
English date format, the actual format may depend on your system setting in windowsTM
EXAMPLE LINE English: CreationDate: 04/06/17 19:46:57
-------------------------------------------------------
TEST it by an "Regular Expressions"-rule:
// Matches are count from the left: (1) (2) (3) (4) (5) (6) (7)
// Use that parts to compose the wanted NewName: $1 $2 $3...
Expression "CreationDate\s*\:\s*(\d\d).(\d\d).(\d\d).\s*(\d\d):(\d\d):(\d\d)"
Replace "20$3-$2-$1 $4$5$6" (skip extension)
Try that "Regular Expressions-rule" with the Analyze tool (Shift+A)
http://www.den4b.com/wiki/ReNamer:Analyze
Original:
CreationDate: 04/06/17 19:46:57
Replaced:
2017-06-04 194657
-------------------------------------------------------
Working code to get the CreationDate in English format:
//Working code to get the CreationDate in English format:
//Author: Denis Kozlov. Date: 2013-04-01. Stefan 2019-10-04
const
EXE = 'pdfinfo.exe -enc UTF-8';
//Find a line in the output, starting with "CreationDate".
//EXAMPLE LINE English: CreationDate: 04/06/17 19:46:57
//TAG = 'CreationDate\s*\:\s*(\d\d)/(\d\d)/(\d\d)/\s*(\d\d):(\d\d):(\d\d)[\r\n]';
TAG = 'CreationDate\s*\:\s*(\d\d).(\d\d).(\d\d).\s*(\d\d):(\d\d):(\d\d)[\r\n]';
// Matches are count from the left: (0) (1) (2) (3) (4) (5)
// Use that parts to compose the wanted NewName: Matches[0] Matches[1] Matches[2]...
var
Command, Output: String;
Matches: TWideStringArray;
begin
Command := EXE+' "'+FilePath+'"';
if ExecConsoleApp(Command, Output) = 0 then
begin
Matches := SubMatchesRegEx(Output, TAG, False);
if Length(Matches) = 6 then
FileName := '20' + Matches[2] + Matches[1]
+ Matches[0] + WideExtractFileExt(FileName);
end
else
FileName := '__NOTHING_FOUND___'+FileName;
end.
- - -
German date format, the actual format may depend on your system setting in windowsTM
EXAMPLE LINE German: CreationDate: Mon Oct 16 11:58:34 2006
EXAMPLE LINE German: CreationDate: Sun Oct 4 18:11:21 2009 //missing leading zero '0'! on date less than 10
-------------------------------------------------------
TEST it by an "Regular Expressions"-rule:
// Matches are count from the left: (1) (2) (3) (4) (5) (6) (7)
// Use that parts to compose the wanted NewName: $1 $2 $3...
Expression "CreationDate\s*\:\s*(\w\w\w)\s*(\w\w\w)\s*(\d+)\s*(\d\d):(\d\d):(\d\d)\s*(\d\d\d\d)"
Replace "$7-$2-$3 $4$5$6" (skip extension)
Original:
CreationDate: Wed Dec 17 15:00:42 2008
CreationDate: Sun Oct 4 18:11:21 2009
Replaced:
2008-Dec-17 150042
2009-Oct-4 181121
-------------------------------------------------------
Working code to get the CreationDate in German format:
//Working code to get the CreationDate in German format:
//Author: Denis Kozlov. Date: 2013-04-01. Stefan 2019-10-04
const
EXE = 'pdfinfo.exe -enc UTF-8';
//Find a line in the output, starting with "CreationDate".
//EXAMPLE LINE German: CreationDate: Sun Oct 4 18:11:21 2009
//EXAMPLE LINE German: CreationDate: WeekDay Month Day 18:11:21 Year
TAG = 'CreationDate:\s+(\w\w\w)\s+(\w\w\w)\s+(\d+)\s+(\d\d):(\d\d):(\d\d)\s+(\d\d\d\d)';
// Matches are count from the left: (0) (1) (2) (3) (4) (5) (6) (7)
// Use that parts to compose the wanted NewName: Matches[0] Matches[1] Matches[2]...
var
Command, Output: String;
Matches: TWideStringArray;
begin
Command := EXE+' "'+FilePath+'"';
if ExecConsoleApp(Command, Output) = 0 then
begin
Matches := SubMatchesRegEx(Output, TAG, False);
//showmessage(IntToStr(Length(Matches)));
if Length(Matches) = 7 then
begin
//showmessage(Output);
//replace MONTH word by month number:
Matches[1] := WideReplaceStr(Matches[1], 'Sep', '09');
Matches[1] := WideReplaceStr(Matches[1], 'Oct', '10');
//pad DAY less than 10 by an zero:
If( Length(Matches[2]) <2) Then Matches[2] := '0'+Matches[2];
FileName := Matches[6]+'-'+Matches[1]+'-'+Matches[2]
+'_'+Matches[3]+Matches[4]+Matches[5]+'_'+FileName;
end
end
else
FileName := '__NOTHING_FOUND___'+FileName;
end.
HTH?
Read the *WIKI* for HELP + MANUAL + Tips&Tricks.
If ReNamer had helped you, please *DONATE* to Denis or buy a PRO license. (Read *Lite vs Pro*)
Offline
The pdfinfo tool has a command line option "-rawdates" which might simplify date parsing.
By default, I get the following date format:
CreationDate: Thu Jan 31 23:33:00 2019
But with the "-rawdates" option I get this:
CreationDate: D:20190131233300+01'00'
Offline