Difference between revisions of "ReNamer:Pascal Script:Unicode String Handling Routines"

From den4b Wiki
Jump to navigation Jump to search
m (remove some msword style quotes, formatted code)
Line 3: Line 3:
 
==Unicode String Handling Routines or How to operate on words==
 
==Unicode String Handling Routines or How to operate on words==
  
And what if we have mp3 files of certain format, eg. ''<nowiki>’</nowiki>author – title.mp3<nowiki>’</nowiki>'' and we want to rename them into ''<nowiki>’</nowiki>title - author.mp3<nowiki>’</nowiki>''? We need to split filename in some certain place (on ''<nowiki>’</nowiki> – <nowiki>’</nowiki>'') and then use created parts to build a new filename. We can achieve that with WideSplitString function that takes a string to split (Input) and a Delimiter and returns an array of strings (TStringsArray). If the Input is ''<nowiki>’</nowiki>Queen Bohemian Rhapsody<nowiki>’</nowiki>'' and a Delimiter is ''<nowiki>’</nowiki> - <nowiki>’</nowiki>'' it will produce an array <nowiki>['</nowiki>Queen<nowiki>’</nowiki>'', ''<nowiki>’</nowiki>Bohemian Rhapsody<nowiki>']</nowiki>.
+
And what if we have mp3 files of certain format, eg. "''author – title.mp3''" and we want to rename them into "''title - author.mp3''"? We need to split filename in some certain place (on "'' - ''") and then use created parts to build a new filename. We can achieve that with WideSplitString function that takes a string to split (Input) and a Delimiter and returns an array of strings (TStringsArray). If the Input is "''Queen - Bohemian Rhapsody''" and a Delimiter is "'' - ''" it will produce an array <nowiki>[</nowiki>"''Queen''", "''Bohemian Rhapsody''"<nowiki>]</nowiki>.
  
Please pay attention that TStringsArray type arrays are zero-based, which means the index of the first element is 0. So we will get array<nowiki>[</nowiki>0<nowiki>]</nowiki> = ''<nowiki>’</nowiki>Queen<nowiki>’</nowiki>'' and array<nowiki>[</nowiki>1<nowiki>]</nowiki> = ''<nowiki>’</nowiki>Bohemian Rhapsody<nowiki>’</nowiki>''.
+
Please pay attention that TStringsArray type arrays are zero-based, which means the index of the first element is 0. So we will get array<nowiki>[</nowiki>0<nowiki>]</nowiki> = "''Queen''" and array<nowiki>[</nowiki>1<nowiki>]</nowiki> = "''Bohemian Rhapsody''". The whole operation can be achieved with such a piece of code.
  
The whole operation can be achieved with such a piece of code.
+
<u>To understand the code below you'll need basic knowledge about variables declaration, arrays and if-then-else statement.</u>
 
 
<u>To understand the code below you<nowiki>’</nowiki>ll need basic knowledge about variables declaration, arrays and if-then-else statement.</u>
 
  
 
<pre><nowiki>
 
<pre><nowiki>
 
var
 
var
SplittedFileName : TStringsArray;
+
  SplittedFileName: TStringsArray;
 
begin
 
begin
 
+
  SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' - ');
SplittedFileName:=WideSplitString(WideExtractBaseName(FileName), ' - ');
+
  if Length(SplittedFileName) = 2 then
if Length(SplittedFileName) = 2 then
+
    FileName := SplittedFileName[1] + ' - ' + SplittedFileName[0] + WideExtractFileExt(FileName);
  FileName:=SplittedFileName[1] + ' - ' +SplittedFileName[0] + WideExtractFileExt(FileName);
 
 
 
 
end.
 
end.
 
</nowiki></pre>
 
</nowiki></pre>
  
  
The script will produce ''<nowiki>’</nowiki>Bohemian Rhapsody – Queen.mp3<nowiki>’</nowiki>'' from ''<nowiki>’</nowiki>Queen – Bohemian Rhapsody.mp3<nowiki>’</nowiki>''.
+
The script will produce "''Bohemian Rhapsody – Queen.mp3''" from "''Queen – Bohemian Rhapsody.mp3''".
  
We are checking the length of the array SplittedFileName to ensure that we won<nowiki>’</nowiki>t go out of the array bounds (if we would have a file of a different format in the files table, eg. ''<nowiki>’</nowiki>Bohemian Rhapsody (Queen)<nowiki>’</nowiki>''), which would give us an error.
+
We are checking the length of the array SplittedFileName to ensure that we won't go out of the array bounds (if we would have a file of a different format in the files table, eg. "''Bohemian Rhapsody (Queen)''"), which would give us an error.
  
 
If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:
 
If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:
  
 
<pre><nowiki>
 
<pre><nowiki>
SplittedFileName:=WideSplitString(WideExtractBaseName(FileName), ' ');
+
SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' ');
 
</nowiki></pre>
 
</nowiki></pre>
  
Line 37: Line 33:
  
 
<pre><nowiki>
 
<pre><nowiki>
FileName:=WideReplaceStr(FileName, 'your car', 'my car');
+
FileName := WideReplaceStr(FileName, 'your car', 'my car');
 
</nowiki></pre>
 
</nowiki></pre>
  
Line 76: Line 72:
 
<pre><nowiki>
 
<pre><nowiki>
 
var
 
var
 
+
  Car, Not_Word : Integer;
Car, Not_Word : Integer;
 
 
 
 
begin
 
begin
 
+
  Car := WidePos('your car', WideLowerCase(FileName));
Car:=WidePos('your car', WideLowerCase(FileName));
+
  Not_Word := WidePos('not ', WideLowerCase(FileName));
Not_Word:=WidePos('not ', WideLowerCase(FileName));
+
  if Car > 0 then  
 
+
    if (Not_Word > 0) and (Not_Word < Car) then
if Car > 0 then  
+
      begin
  if (Not_Word > 0) and (Not_Word < Car) then
+
        WideDelete(FileName, Car, Length('your'));
    begin
+
        WideInsert('my', FileName, Car);
      WideDelete(FileName, Car, Length('your'));
+
      end;
      WideInsert('my', FileName, Car);
 
    end;
 
 
 
 
end.
 
end.
 
</nowiki></pre>
 
</nowiki></pre>
Line 110: Line 101:
  
 
<pre><nowiki>
 
<pre><nowiki>
FileName:=WideUpperCase(FileName[1])+WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));
+
FileName := WideUpperCase(FileName[1]) + WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));
 
</nowiki></pre>
 
</nowiki></pre>
  
 
We are building the FileName from two blocks: first is the first letter of FileName changed to uppercase and second – is the rest of the FileName made lowercase. We use ''WideCopy(FileName, 2, Length(FileName)-1)'' statement to get everything from the second letter till the end of the filename.
 
We are building the FileName from two blocks: first is the first letter of FileName changed to uppercase and second – is the rest of the FileName made lowercase. We use ''WideCopy(FileName, 2, Length(FileName)-1)'' statement to get everything from the second letter till the end of the filename.

Revision as of 21:21, 28 May 2009

{{{iparam}}} This article needs to be cleaned up!

Unicode String Handling Routines or How to operate on words

And what if we have mp3 files of certain format, eg. "author – title.mp3" and we want to rename them into "title - author.mp3"? We need to split filename in some certain place (on " - ") and then use created parts to build a new filename. We can achieve that with WideSplitString function that takes a string to split (Input) and a Delimiter and returns an array of strings (TStringsArray). If the Input is "Queen - Bohemian Rhapsody" and a Delimiter is " - " it will produce an array ["Queen", "Bohemian Rhapsody"].

Please pay attention that TStringsArray type arrays are zero-based, which means the index of the first element is 0. So we will get array[0] = "Queen" and array[1] = "Bohemian Rhapsody". The whole operation can be achieved with such a piece of code.

To understand the code below you'll need basic knowledge about variables declaration, arrays and if-then-else statement.

var
  SplittedFileName: TStringsArray;
begin
  SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' - ');
  if Length(SplittedFileName) = 2 then
    FileName := SplittedFileName[1] + ' - ' + SplittedFileName[0] + WideExtractFileExt(FileName);
end.


The script will produce "Bohemian Rhapsody – Queen.mp3" from "Queen – Bohemian Rhapsody.mp3".

We are checking the length of the array SplittedFileName to ensure that we won't go out of the array bounds (if we would have a file of a different format in the files table, eg. "Bohemian Rhapsody (Queen)"), which would give us an error.

If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:

SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' ');

Another useful function is WideReplaceStr function. With its help we can eg. replace all appearances of ’your car’ phrase with ’my car’.

FileName := WideReplaceStr(FileName, 'your car', 'my car');

It will also change ’not your car’ into ’not my car’ and if we are really possesive and egoistic we might not like that...

To solve this problem we will need few others string handling functions and procedures: WidePos, WideInsert and WideDelete. If you’re sure you won’t process any unicode characters, you may use Pos, Insert and Delete functions/procedures instead.

Before we start to describe them we need to tell you that strings in Pascal are represented as 1-based arrays of chars which means that the first index of string is 1 (so FileName[0] gives ’out of bounds error’).

Now we can take a look at the description of functions/procedures that were mentioned above.

function WidePos(const SubStr, S: WideString): Integer;

WidePos finds a substring in given string S and returns the position of its first char.

So WidePos(’car’, ’scar tissue’) will return 2.

If the substring is not present in string S the function will return 0.

procedure WideInsert(const Substr: WideString; var Dest: WideString; Index: Integer);

WideInsert inserts given substring into Dest string starting from Index. So WideInsert(’not ’, ’it is my car’, 7) will change the Dest string into ’it is not my car’.

procedure WideDelete(var S: WideString; Index, Count: Integer);

WideDelete deletes Count number of chars from S string starting at Index. So WideDelete(’it is not my car’, 7, 4) will change back the S string into ’it is my car’.

Armed with that knowledge we can write a script that will find ’your car’ phrase and will check if there is a word ’not’ before it (no matter where exactly, but between beginning of the filename and the phrase). And only if there is no such word, it will replace ’your’ with ’my’.

In opposition to the WideReplaceStr function this script will find only first appearance of searched phrase. If we would like to check all appearances, we would have to put this code into some fancy loop.

var
  Car, Not_Word : Integer;
begin
  Car := WidePos('your car', WideLowerCase(FileName));
  Not_Word := WidePos('not ', WideLowerCase(FileName));
  if Car > 0 then 
    if (Not_Word > 0) and (Not_Word < Car) then
      begin
        WideDelete(FileName, Car, Length('your'));
        WideInsert('my', FileName, Car);
      end;
end.

I guess you’re curious why we did search ’your car’ and ’not ’ phrases in lowercased filename (WideLowerCase(FileName)). We did that because WidePos function is case sensitive. Please pay attention that we didn’t change the actual case of the filename. We just passed the copy of lowercased filename string into WidePos function. This ensures that any variant of case will be found as all of them (eg. ’Your Car’, ’YoUR caR’) are identical to ’your car’ after lowercasing.

And finally last, but not least, in this chapter will be presented WideCopy function. Let’s take a look on it’s declaration:

function WideCopy(const S: WideString; Index, Count: Integer): WideString;

WideCopy will return a substring of string S that starts on Index and has numbers of chars defined by Count parameter.

This means that WideCopy(’sit down’; 5, 4) will return ’down’ (4 letters starting from index 5).


This function will let us capitalize only first letter of the filename.

FileName := WideUpperCase(FileName[1]) + WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));

We are building the FileName from two blocks: first is the first letter of FileName changed to uppercase and second – is the rest of the FileName made lowercase. We use WideCopy(FileName, 2, Length(FileName)-1) statement to get everything from the second letter till the end of the filename.