ReNamer:Pascal Script:Unicode String Handling Routines

From den4b Wiki
Revision as of 15:05, 8 February 2017 by Den4b (talk | contribs) (Text replacement - "</source>" to "</syntaxhighlight>")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Unicode String Handling Routines or How to operate on words

Swapping parts of the FileName

What if we have mp3 files of certain format, eg. "author – title.mp3" and we want to rename them into "title - author.mp3"? We need to split filename in some certain place (on " - ") and then use created parts to build a new filename. We can achieve that with WideSplitString function that takes Input (a string to split) and Delimiter paramethers and returns an array of strings (TWideStringArray type). If the Input is "Queen - Bohemian Rhapsody" and a Delimiter is " - " it will produce an array ["Queen", "Bohemian Rhapsody"].

Please pay attention that TWideStringArray type arrays are zero-based, which means the index of the first element is 0. So we will get array[0] = "Queen" and array[1] = "Bohemian Rhapsody".

The whole operation can be achieved with such a piece of code.

To understand the code below you'll need basic knowledge about variables declaration, arrays and if-then-else statement.

var
  SplittedFileName: TWideStringArray;
begin
  SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' - ');
  if Length(SplittedFileName) = 2 then
    FileName := SplittedFileName[1] + ' - ' + SplittedFileName[0] + WideExtractFileExt(FileName);
end.

The script will produce "Bohemian Rhapsody – Queen.mp3" from "Queen – Bohemian Rhapsody.mp3".

We are checking the length of the array SplittedFileName to ensure that we won't go out of the array bounds. This would happen if we would have a file of a different format in the files table, eg. "Bohemian Rhapsody (Queen)").


Splitting the FileName into words

If we would like to split the FileName into words (word in this case is anything that lays between two spaces) the proper line of code would look like this:

SplittedFileName := WideSplitString(WideExtractBaseName(FileName), ' ');


Replacing parts of the FileName

Another useful function is WideReplaceStr function. With its help we can eg. replace all appearances of 'your car' phrase with 'my car'.

FileName := WideReplaceStr(FileName, 'your car', 'my car');

It will also change 'not your car' into 'not my car' and if we are really possesive and egoistic we might not like that...


WidePos, WideInsert and WideDelete functions

To solve the problem we will need few others string handling functions and procedures: WidePos, WideInsert and WideDelete. If you’re sure you won’t process any unicode characters, you may use Pos, Insert and Delete functions/procedures instead.

Before we start to describe them you need to know that strings in Pascal are represented as 1-based arrays of chars which means that the first index of string is 1 (so FileName[0] gives 'out of bounds error').

Now we can take a look at the description of functions/procedures that were mentioned above.

function WidePos(const SubStr, S: WideString): Integer;

WidePos finds a substring in given string S and returns the position of its first char.

So WidePos('car', 'scar tissue') will return 2.

If the substring is not present in the S string function will return 0.

procedure WideInsert(const Substr: WideString; var Dest: WideString; Index: Integer);

WideInsert inserts given substring into Dest string starting from Index. So WideInsert('not ', 'it is my car', 7) will change the Dest string into 'it is not my car'.

procedure WideDelete(var S: WideString; Index, Count: Integer);

WideDelete deletes Count number of chars from S string starting at Index. So WideDelete('it is not my car', 7, 4) will change back the S string into 'it is my car'.

Armed with that knowledge we can write a script that will find 'your car' phrase and will check if there is a word 'not' before it (no matter where exactly, but between beginning of the filename and the phrase). And only if there is no such word, it will replace 'your' with 'my'.


Full control over Find & Replace operation

In opposition to the WideReplaceStr function this script will find only the first appearance of searched phrase. If we would like to check all appearances, we would have to put this code into some fancy loop.

var
  Car_Index, Not_Index : Integer;
begin
  Car_Index := WidePos('your car', WideLowerCase(FileName));
  Not_Index := WidePos('not ', WideLowerCase(FileName));
  if Car_Index > 0 then 
    if (Not_Index > 0) and (Not_Index < Car_Index) then
      begin
        WideDelete(FileName, Car_Index, Length('your'));
        WideInsert('my', FileName, Car_Index);
      end;
end.

I guess you’re curious why we did search 'your car' and 'not ' phrases in lowercased FileName (WideLowerCase(FileName)). We did that because WidePos function is case sensitive. Please pay attention that we didn’t change the actual case of the FileName. We just passed the copy of lowercased FileName string into WidePos function. This ensures that any variant of case will be found as all of them (eg. 'Your Car', 'YoUR caR') are identical to 'your car' after lowercasing.


WideCopy function

And finally last, but not least, in this chapter will be presented WideCopy function. Let’s take a look on it’s declaration:

function WideCopy(const S: WideString; Index, Count: Integer): WideString;

WideCopy will return a substring of string S that starts on Index and has numbers of chars defined by Count parameter.

This means that WideCopy(’sit down’; 5, 4) will return ’down’ (4 letters starting from index 5).


Making first letter capital

WideCopy function will let us capitalize only the first letter of the filename.

FileName := WideUpperCase(FileName[1]) + WideLowerCase(WideCopy(FileName, 2, Length(FileName)-1));

We are building the FileName from two parts: first goes uppercased first letter of the FileName and then lowercased rest of the FileName. We use WideCopy(FileName, 2, Length(FileName) - 1) statement to get everything from the second letter till the end of the FileName.