#1 2014-05-28 15:52

Elektro
Senior Member
Registered: 2014-05-28
Posts: 76

"HTML Decode" as a new option for Clean-Up

I think that this software needs an automated way to clean-up html entities and escaped entities because (for example) when download a file from many existing hostings it preserves those HTML entities in the file-tittle then by the moment we need to add a single rule for each one, for example:

Rules:

%26 > &
%27 > '
%28 > (
%29 > )
& > &
± > ±
&#8217 > '
etc...

As you could see it's a very hard and ugly job to set an specific rule just to encapsulate the whole thing, actually are more than a thousand, and I only can see automated support for html spaces (%20) in the program.

PS: I didn't know the language which this software it's written, but just for interest .NET has helper classes to facilite this task at the System.Web.HttpUtility namespace which could be used or taken by a good example to implement it using other language(s).

PS2: Sorry for my English.

Elektro.

Offline

#2 2014-05-28 21:06

Stefan
Moderator
From: Germany, EU
Registered: 2007-10-23
Posts: 1,161

Re: "HTML Decode" as a new option for Clean-Up

Hi Elektro, welcome!

There are many things which could be added.
So instead of adding everything, you could use an translit rule:

http://www.den4b.com/wiki/ReNamer:Rules:Translit
http://www.den4b.com/forum/viewtopic.php?id=667


For example:

(I don't remember currently if I had found a way to escape the space at %20? Using  Alt255 is not a good option.)

\ReNamer\Translits\URL-2-ANSI.txt

%20= 
%23=#
%24=$
%25=%
%26=&
%3B=;
%5B=[
%5D=]

\ReNamer\Translits\ANSI-2-URLEncoding.txt

 =%20
#=%23
$=%24
%=%25
&=%26
'=%27
,=%2C
;=%3B
[=%5B
]=%5D

Y
ou could create, test and provide such a list to the community.


.


Read the  *WIKI* for HELP + MANUAL + Tips&Tricks.
If ReNamer had helped you, please *DONATE* to Denis or buy a PRO license. (Read *Lite vs Pro*)

Offline

#3 2014-05-30 17:21

Elektro
Senior Member
Registered: 2014-05-28
Posts: 76

Re: "HTML Decode" as a new option for Clean-Up

Thankyou, really my "url-2-ansi" list is tiny to share it but thanks for the idea.

Just a related question, this means that I can "translate" all these rules as an unique Transit rule and I should obtain the same results?:

( Note the starting and ending spaces separating the "=" delimiter, thats my issue )
me43iyb.jpg

Something like:

 @ = at 
D.J. =Dj 
 aint = ain't 

etc...?

But what happens about case-sensitivity of letters?, I should add all "case" variations manually? sad.

Last edited by Elektro (2014-05-30 17:26)

Offline

#4 2014-05-30 18:47

Stefan
Moderator
From: Germany, EU
Registered: 2007-10-23
Posts: 1,161

Re: "HTML Decode" as a new option for Clean-Up

Elektro wrote:

this means that I can "translate" all these rules as an unique Transit rule and I should obtain the same results?:
Note the starting and ending spaces separating the "=" delimiter, thats my issue
But what happens about case-sensitivity of letters?, I should add all "case" variations manually? sad.

I don't know yet. We have to check it out...


We can also use this "AdvancedTranslit" or "Dictionary" PascalScript:

(I think there should be already such script in the forum somewhere)


var
 DicFile,DicLine,tmp:WideString;
 i,DiFiLines        :Integer;
 DicArray           :TStringsArray;

begin
    tmp       := WideExtractBaseName(FilePath);
    DicFile   := 'dict.txt';
    DiFiLines := FileCountLines(DicFile);
    for i:=1 to DiFiLines do
    begin
        DicLine  :=  FileReadLine(DicFile,i);
        DicArray := WideSplitString(DicLine, '|');
        tmp := WideReplaceStr(tmp, DicArray[0], DicArray[1]);
    end;
    //new FileName from 'tmp' variable:
    FileName := tmp + WideExtractFileExt(FilePath);
end.

(What to do with this ?  Read >> http://www.den4b.com/wiki/ReNamer:Rules:PascalScript)


This script needs a file called "Dict.txt" in the same folder as the ReNamer.exe
In the "Dict.txt" put a FROM | TO rule on each line:

 @ | at
 D.J. | Dj 
 and | & 


The script above will
-- loop over each line from "Dict.txt",
-- split the current line into "search" and "Replace with" pairs at the pipe "|" sign
-- use that to replace the FileName

Note:
the script works case-sensitive due to WideReplaceStr().
Use WideReplaceText() to not respect the case.
Also note the surround blanks, to not replace inside of other words.


.


Read the  *WIKI* for HELP + MANUAL + Tips&Tricks.
If ReNamer had helped you, please *DONATE* to Denis or buy a PRO license. (Read *Lite vs Pro*)

Offline

#5 2014-06-04 17:19

den4b
Administrator
From: den4b.com
Registered: 2006-04-06
Posts: 3,479

Re: "HTML Decode" as a new option for Clean-Up

A very interesting suggestion! It's added to the list for future implementation.

By the way, you can already do URL decoding using a very short PascalScript rule:

begin
  FileName := URLDecode(FileName);
end.

Offline

Board footer

Powered by FluxBB