Text replacement using a dictionary file

m4dcow · 2017-05-11 23:49

(Continued from a different topic: Too many characters in replace text)

m4dcow wrote:

Every now and then I have to rename a bunch of files using an ever increasing library of acronyms (lcd=liquid crystal display, led=light emitting diode etc...). I store these in an excel file that I can output into the replace format (*|* in between). When I have many acronyms to add I generate from the excel file, but if I only have a few it is quicker to just type them directly into the excel file and also the replace text fields.

den4b wrote:

If you provide a small sample of your acronyms (posted as plain text here) just to demonstrate the pattern, then we can help you out with a basic script.

So I don't really want to post the actual data but but my pattern is like airport codes, and multi word cities where they are located (except mine aren't all 3 characters)
ABS=Abu Simbel
GIC=Boiju Island
CBL=Ciudad Bolivar
DSK=Dera Ismail Khan
EDR=Edward River

Where these acronyms are in the filenames are in the filenames, are preceded by a - and suffixed with an underscore. So if I had an acronym that formed a word that appeared in the filename, it wouldn't replace that. So the actual before and after replace would look like this
-ABS_=-Abu Simbel_
-GIC_=-Boiju Island_
-CBL_=-Ciudad Bolivar_
-DSK_=-Dera Ismail Khan_
-EDR_=-Edward River_

I know there is a better way, but this is how I got it working, so I didn't mess with it.

den4b · 2017-05-12 14:25

Here is a script which will load a dictionary from a file specified in DictionaryFileName. Each line in a dictionary file is a set of find and replace values delimited by "=" (equal sign).

Replacement is performed as is, i.e. find => replace. Not extra checks or modifications are performed, e.g. no word boundary check.

Note that this script does not handle Unicode dictionary data, only plain ANSI.

const
  DictionaryFileName = 'Dictionary.txt';
  Delimiter = '=';

var
  FindList, ReplaceList: TAnsiStringArray;
  Initialized: Boolean;

procedure LoadDictionary;
var
  Line, DelimStr: String;
  Lines: TAnsiStringArray;
  I, DelimPos: Integer;
begin
  Lines := FileReadLines(DictionaryFileName);
  SetLength(FindList, Length(Lines));
  SetLength(ReplaceList, Length(Lines));
  DelimStr := Delimiter;
  for I := 0 to Length(Lines) - 1 do
  begin
    Line := Lines[I];
    DelimPos := Pos(DelimStr, Line);
    if DelimPos > 0 then
    begin
      FindList[I] := Copy(Line, 1, DelimPos - 1);
      ReplaceList[I] := Copy(Line, DelimPos + Length(DelimStr), Length(Line));
    end
    else
    begin
      FindList[I] := '';
      ReplaceList[I] := '';
    end;
  end;
end;

function ApplyDictionary(const Input: WideString): WideString;
var
  I: Integer;
  FindStr, ReplaceStr: WideString;
begin
  Result := Input;
  for I := 0 to Length(FindList) - 1 do
  begin
    FindStr := FindList[I];
    ReplaceStr := ReplaceList[I];
    if Length(FindStr) > 0 then
      Result := WideReplaceStr(Result, FindStr, ReplaceStr);
  end;
end;

begin
  if not Initialized then
  begin
    Initialized := True;
    LoadDictionary;
  end;
  FileName := ApplyDictionary(WideExtractBaseName(FileName)) +
    WideExtractFileExt(FileName);
end.

Last edited by den4b (2017-05-12 14:30)

m4dcow · 2017-05-13 02:18

Awesome! I still have to figure out how to have it work on only stuff preceded by a "-" and followed by a "_" so I don't have to hard code that stuff in the values but this is a great help. Thank you so much!

den4b · 2017-05-17 20:59

m4dcow wrote:

I still have to figure out how to have it work on only stuff preceded by a "-" and followed by a "_" so I don't have to hard code that stuff in the values...

If you haven't figured it out yet, the key is in modifying the following line:

Result := WideReplaceStr(Result, FindStr, ReplaceStr);

You could change it to:

Result := WideReplaceStr(Result, '-' + FindStr + '_', ReplaceStr);

Or, you could add the prefix and suffix as constants at the top of script like so:

const
  FindPrefix = '-';
  FindSuffix = '_';

And the modify the replacement line to:

Result := WideReplaceStr(Result, FindPrefix + FindStr + FindSuffix, ReplaceStr);

The benefit of having them as constants is that it is easy to change them later, if you need to.

zelos · 2018-07-19 09:29

Am I right that if I change the code snippet from

Str

Result := WideReplaceStr(Result, FindStr, ReplaceStr);

into

Text

Result := WideReplaceText(Result, FindStr, ReplaceStr);

the search will not be case-sensitive?

I am currently concerned with a huge MS-Outlook export (*.msg-files) and a lot of mail adresses vary with regards to upper and lower cases.

Btw: Thanks for this really helpful program!

Stefan · 2018-07-19 10:32

Hi zelos,

FYI, there is a documentation:
ReNamer:Pascal_Script:Functions#Unicode_String_Handling

I guess you are right, just test it on a small example file yourself for clarifying

If you have more question, just ask.

den4b · 2018-07-19 11:08

zelos, that's correct.

WideReplaceStr performs a case-sensitive replacement.

WideReplaceText performs a case-insensitive replacement.

den4b Forum

#1 2017-05-11 23:49

Text replacement using a dictionary file

#2 2017-05-12 14:25

Re: Text replacement using a dictionary file

#3 2017-05-13 02:18

Re: Text replacement using a dictionary file

#4 2017-05-17 20:59

Re: Text replacement using a dictionary file

#5 2018-07-19 09:29

Re: Text replacement using a dictionary file

#6 2018-07-19 10:32

Re: Text replacement using a dictionary file

#7 2018-07-19 11:08

Re: Text replacement using a dictionary file

Board footer