#1 2023-01-25 01:07

visusys
Member
Registered: 2021-10-16
Posts: 20

Lookaround TRegExpr Compile Errors

I am trying to utilize lookaround to separate digits with periods, but only in certain circumstances.

My exact goal is here (this is my post): https://stackoverflow.com/questions/752 … 2#75228472

The first response gave me the following solution:

Expression:

(?:(?<=\s)|(?<=\sv))(\d+)\s(?=\d)

Replace:

$1.

Unfortunately I am getting an error in Renamer.

TRegExpr compile: lookaround brackets must be at the very beginning/ending (pos 4)

mini_ReNamer_FU12Xkel8o.png

Can someone explain to me how I can get this expression to work?

Or what regex engine that ReNamer uses so I can ask for more specific help on stackoverflow?

Any help at all would be great.

Offline

#2 2023-01-25 15:54

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,440

Re: Lookaround TRegExpr Compile Errors

Regarding the error message:

TRegExpr compile: lookaround brackets must be at the very beginning/ending (pos 4)

This is a known limitation of the regex engine, as documented here:
https://regex.sorokin.engineer/en/lates … assertions

Brackets for lookahead must be at the very ending of expression, and brackets for lookbehind must be at the very beginning. So assertions between choices |, or inside groups, are not supported.

It should be possible to achieve the same result with multiple rules or pascal script.

Offline

#3 2023-01-29 02:43

visusys
Member
Registered: 2021-10-16
Posts: 20

Re: Lookaround TRegExpr Compile Errors

den4b wrote:

Regarding the error message:

It should be possible to achieve the same result with multiple rules or pascal script.

If you don't mind, this is the expression that I want to get working the most:

https://stackoverflow.com/questions/752 … 3_75228498

Expression:

(?:HoRNet|more|exceptions|here)(*SKIP)(*F)|(?<=[a-z])(?=[A-Z]\B)

Replace:

(A space)

Do you know how I can get this working through usage of multiple rules or PascalScript?

The expression separates CamelCase similar to the clean-up function, but allows for a "whitelist" that when encountered doesn't get modified.

Does PascalScript utilize a more robust RegEx engine? If so, can I invoke more advanced RegEx expressions through PascalScript?

Any help at all would be greatly appreciated.

Offline

#4 2023-01-31 00:48

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,440

Re: Lookaround TRegExpr Compile Errors

This script for the Pascal Script rule should do it.

List all your special cases in the Exceptions constant. You will need to use at least ReNamer 7.4.0.2 Beta.

const
  Exceptions = 'HoRNet|more|exceptions|here';
var
  Positions: TIntegerArray;
  Matches: TWideStringArray;
  I, NumMatches, PositionAnchor: Integer;
begin
  PositionAnchor := Length(FileName);
  NumMatches := FindRegEx(FileName, '(?<=[a-z])(?=[A-Z]\B)', True, Positions, Matches);
  for I := NumMatches - 1 downto 0 do
  begin
    if Positions[I] <= PositionAnchor then
    begin
      Insert(' ', FileName, Positions[I]);
      Matches := MatchesRegEx(Copy(FileName, 1, Positions[I]-1), '(' + Exceptions + ')\Z', True);
      if Length(Matches) > 0 then
        PositionAnchor := Positions[I] - Length(Matches[0]);
    end;
  end;
end.

Example input:

HoRNetCompExp_x64
HoRNetDynamicsControl_x64
HoRNetHarmonics_x64
HoRNetHTS9_x64
HoRNetMulticompPlusMK2_x64
HoRNetSongKeyMK3_x64
HoRNetTape_x64
HoRNetValvola_x64

Example output:

HoRNet Comp Exp_x64
HoRNet Dynamics Control_x64
HoRNet Harmonics_x64
HoRNet HTS9_x64
HoRNet Multicomp Plus MK2_x64
HoRNet Song Key MK3_x64
HoRNet Tape_x64
HoRNet Valvola_x64

Offline

#5 2023-01-31 04:18

visusys
Member
Registered: 2021-10-16
Posts: 20

Re: Lookaround TRegExpr Compile Errors

Awesome! Thank you so much! I'll be including that in my preset now, you're a lifesaver.

Offline

#6 2023-02-01 03:24

visusys
Member
Registered: 2021-10-16
Posts: 20

Re: Lookaround TRegExpr Compile Errors

den4b wrote:

This script for the Pascal Script rule should do it.

List all your special cases in the Exceptions constant. You will need to use at least ReNamer 7.4.0.2 Beta.

Hi again, I just got around to testing the script.

While it works if an exception occurs at the beginning of the filename with no spaces, it falls apart if the exception occurs anywhere else in the string. See this image:

dFg42T4.png


Input:

ExceptionTwoTest
ExceptionTwo Test
Test ExceptionTwo
Test ExceptionNumOne
ExceptionNumOneTest
ExceptionNumOne Test

Correct Output:

ExceptionTwo Test
ExceptionTwo Test
Test ExceptionTwo
Test ExceptionNumOne
ExceptionNumOne Test
ExceptionNumOne Test

Actual Output:

ExceptionTwo Test
Exception Two Test
Test Exception Two
Text Exception Num One
ExceptionNumOne Test
Exception Num One Test

I think it has to do with the regex. I'm not sure how to modify it though to fix the issue. I'll keep experimenting.

If you happen to know a fix that would be great!

Offline

#7 2023-02-06 03:23

jogiwer
Member
From: Germany
Registered: 2022-11-05
Posts: 66

Re: Lookaround TRegExpr Compile Errors

One advantage of lookahead and lookbehind is that you don't need to reinsert these parts on replacement.

In other words: if you check your surroundings without using '(?<=...)', '(?<!...)', '(?=...)', '(?!...)' you just have to use these subexpressions inside your replace string via $1..$9.

But! The next replacement would start after the catched string. So in your special situation where you to recognize exceptions by their first character it would be wise to stay with a positive lookahead (at the end).

Try this Expression:

(ExceptionNumOne|ExceptionTwo|[a-z])(?=[A-Z]\B)

with this Replace (containing a space at the end):

$1 

and make sure the option "Case-sensitive" is set.

Hope that helps!

Offline

#8 2023-02-06 05:39

visusys
Member
Registered: 2021-10-16
Posts: 20

Re: Lookaround TRegExpr Compile Errors

jogiwer wrote:

One advantage of lookahead and lookbehind is that you don't need to reinsert these parts on replacement.

In other words: if you check your surroundings without using '(?<=...)', '(?<!...)', '(?=...)', '(?!...)' you just have to use these subexpressions inside your replace string via $1..$9.

But! The next replacement would start after the catched string. So in your special situation where you to recognize exceptions by their first character it would be wise to stay with a positive lookahead (at the end).

Try this Expression:

(ExceptionNumOne|ExceptionTwo|[a-z])(?=[A-Z]\B)

with this Replace (containing a space at the end):

$1 

and make sure the option "Case-sensitive" is set.

Hope that helps!

Thanks much for the help. Unfortunately the expression you supplied isn't working:

mini_explorer_NHtAFS6OLV.png

I need the exceptions to remain as they are no matter if they are part of a word, or isolated by whitespace. Right now if the exception occurs alone followed by a space, the word is separated (Check image above).

Also I plan on having a large amount of exceptions (100+) so some kind of PascalScript implementation I think would be better so I can store my exceptions in a constant.

Really appreciate the help though!

Offline

#9 2023-02-06 09:33

jogiwer
Member
From: Germany
Registered: 2022-11-05
Posts: 66

Re: Lookaround TRegExpr Compile Errors

visusys wrote:

Thanks much for the help. Unfortunately the expression you supplied isn't working:

Ah you are right - if the exception matches but isn't followed by an upper case letter the regexp engine throws away the match and tries to find another match ...
Using the first part as atomic match would help here. As there won't be a second try you have to make sure that your exceptions are placed in the right order if one exception is the beginning of another longer one:

(?>ExceptionOnePlusMore|ExceptionOne|[a-z])(?=[A-Z]\B)

Another special case comes to my mind: what if the exception word is contained but with additional lower case letters? Like in this text:

ExceptionNumOnetimeTest.txt
visusys wrote:

Also I plan on having a large amount of exceptions (100+) so some kind of PascalScript implementation I think would be better so I can store my exceptions in a constant.

You could also use the shown expression inside a PascalScript block by calling ReplaceRegEx. Here you also could consider to write an initialization block (run only on the first call/file) where you read the exceptions from a file by FileReadContent.

Words could be placed in separate lines if you replace the line breaks by | (pipe or vertical bar) after reading.

Offline

#10 2023-02-08 03:05

visusys
Member
Registered: 2021-10-16
Posts: 20

Re: Lookaround TRegExpr Compile Errors

jogiwer wrote:

Another special case comes to my mind: what if the exception word is contained but with additional lower case letters?

This is a good insight that I hadn't considered entirely. In your example I'd expect ExceptionNumOnetimeTest.txt to evaluate to either "ExceptionNumOne timeTest.txt" (Always isolate the exception) or "ExceptionNumOnetime Test.txt" (Perform camel/pascal case separation exactly after the exception ends).

I think I prefer option one, and always isolate the exception string.

Also I tried your regex above:

(?>ExceptionOnePlusMore|ExceptionOne|[a-z])(?=[A-Z]\B)

With a replace string of

$1 

And it's just deleting the exceptions. Not sure I have the correct RegEx here.

If it's not too much to ask, can you possibly outline the RegEx as well as PascalScript you would use to read from a file that contains a list of exceptions? I don't care if I have to separate everything by a pipe in the file or restructure the content of the file to a pipe delimited string.

The biggest mystery to me right now is just the exact RegEx I need. I'm not very good at it.

Thanks for any help at all!

Offline

Board footer

Powered by FluxBB