ReNamer:Rules:Translit: Difference between revisions

Latest revision as of 10:43, 6 January 2023

ReNamer:Rules

ReNamer:Rules:CleanUp

ReNamer:Rules:RegEx

This rule transliterates one alphabet into another. Its main goal is to transliterate Non-English characters from different languages into their English/Latin representation. For example, the German character ü can be transliterated to ue (the name Müller can be also written as Mueller).

This rule uses transliteration maps (explained below).

Transliteration maps

To transliterate, we create a pair of equivalent characters, like this: ü=ue

(Note that the right side of this equation has two characters. Any number of characters may be placed on both sides of the equation.)

We need several such equivalent character pairs to convert one language into another. The entire set is called a transliteration map. (This is really some kind of a find-and-replace rule.)

ReNamer has several such built-in maps. Each map is named after a language (the second language in all maps is English).

Each map can be used in both directions (e.g. French-to-English or English-to-French.)

When you start up the Translit Rule, its window does not show any maps. You are free to do any of the following:

Use any of the built-in maps (and use it in forward or reverse direction)
Create your own map and use it.
Edit a built-in map first, and then use it.

Let us see how to do this.

Automatic case conversion

Translit rule does automatic case conversion with an algorithm adopted specifically for transliteration. Translit rule discard the case on the input, i.e. "A=B" is same as "a=b". Case is decided upon case of the input fragment. Multiple character fragments are treated as part of words, with their case decided based on the case of letters around them.

The logic for the case conversion is as follows (ReNamer Beta from 23 Aug 2009):

set OUTPUT-PART to lower case
if first letter in INPUT-PART is upper case then
  if length of OUTPUT-PART bigger than 1 then
    if next letter in original name is upper case then
      convert whole OUTPUT-PART to upper case
    else
      convert only first letter in OUTPUT-PART to upper case
  else
    convert whole OUTPUT-PART to upper case

Using a built-in transliteration map

To select any of the built-in maps, press the button. A list of available transliteration maps pops up:

Click on the desired transliteration map. As an example, let us click on the French (to English) transliteration map.

The Rules window changes immediately to show the French characters and their English equivalents.

You can edit any of the entry in this list, add new entries, or delete any of the entries.

Note that such editing does not alter the saved version of the map. The map is edited just for a one-time use. If you select the same Translit map again, ReNamer will load the original version, not the edited version. You will see how to alter a transliteration map in a section below.

Next, select the rule's parameters as shown below:

Parameter	Details
forward	This is transliteration from-left-to-right direction, as defined in the map.
backward	This is transliteration from-right-to-left direction, as defined in the map.
skip extension	If this check box is selected, the extension will be ignored by the rule.

Finally, press the button to add the rule to the stack.

Making your own transliteration map

Click in the Translit Alphabet window, and start entering your custom alphabet.

Transliteration alphabet consists of two equivalence parts (or a couple), which are entered one per line and two parts separated with "=" (equal sign). Alphabet should not contain spaces and should have case discarded (case is adjusted automatically). Also, make sure to put couples which contain greater number of characters at the top, so they will get processed first and will not get processed partially by shorter representations. Below is a simple example:

щ=sh
ю=yu
я=ya
ь='
э=e

After entering all such transliterations, press the button to add the rule to the rule-stack.

Note that this rule is not saved yet (it was just composed for a one-time use). The following topic shows how to save a map.

Saving a transliteration map

To save a newly composed Transliteration rule,

Press the button.
A menu pops up.
Select the last option (Save Translit...).
A window pops up, as shown below:
Enter a new name for the map and press OK. The new map is saved.

The process of saving an edited Transliteration map is similar. The only difference is that the Save Translit window (see above) shows the current map's name. You can press OK to save the changes you've just made, or enter a new name to create a new translit map for the edited version of the current map.

The new map's name is added to the map list.

From now on, the new map will also be available as "standard".

Unicode character forms

Have you encounter a case where some characters don't get converted, despite having a visually identical character defined in the Translit alphabet?

Unicode characters can be defined using exact character codes or using combining characters. The displayed characters will look identical, but their binary content is completely different. The conversion process between these forms is covered by the Unicode Normalization standard.

Alphabets in the Translit rule are normally defined using exact character codes, so the combining characters won't get affected. You can put a piece of text through a Unicode analyzer to see exactly how each character is defined and to identify the use of combining characters.

To handle all possible forms of the same visual character in Translit alphabets, one could define all possible forms in an alphabet or one can simply strip away those combining characters, which can be accomplished by using the "Strip unicode marks" option found in the Clean Up rule.

@@ Line 1: / Line 1: @@
-[[Image:TranslitRule1.png|center]] This rule transliterates Non-English characters from different languages into their English/Latin representation. For example, the German character '''ü''' can be transliterated to '''ue''' (the name '''Müller '''can be also written as'''Mueller''').
+{{Go|up=ReNamer:Rules|prev=ReNamer:Rules:CleanUp|next=ReNamer:Rules:RegEx}}
+[[Image:TranslitRule.png|center]]
+This rule transliterates one alphabet into another. Its main goal is to transliterate Non-English characters from different languages into their English/Latin representation. For example, the German character '''ü''' can be transliterated to '''ue''' (the name '''Müller '''can be also written as '''Mueller''').
 This rule uses ''transliteration maps'' (explained below).
-=== Transliteration maps<br>  ===
+== Transliteration maps ==
 To transliterate, we create a pair of equivalent characters, like this: '''ü=ue'''
@@ Line 9: / Line 13: @@
 (Note that the right side of this equation has ''two'' characters. Any number of characters may be placed on both sides of the equation.)
-We need several such ''equivalent character pairs'' to convert one language into another. An entire set is called a ''transliteration map''. (This is really a character-level find-and-replace rule.)
+We need several such ''equivalent character pairs'' to convert one language into another. The entire set is called a ''transliteration map''. (This is really some kind of a find-and-replace rule.)
 ReNamer has several such built-in maps. Each map is named after a language (the second language in all maps is English).
@@ Line 23: / Line 27: @@
 Let us see how to do this.
-=== Using a built-in transliteration map  ===
+== Automatic case conversion ==
+Translit rule does automatic case conversion with an algorithm adopted specifically for transliteration. Translit rule discard the case on the input, i.e. "A=B" is same as "a=b". Case is decided upon case of the input fragment. Multiple character fragments are treated as part of words, with their case decided based on the case of letters around them.
+The logic for the case conversion is as follows (ReNamer Beta from 23 Aug 2009):
+<pre>
+set OUTPUT-PART to lower case
+if first letter in INPUT-PART is upper case then
+  if length of OUTPUT-PART bigger than 1 then
+    if next letter in original name is upper case then
+      convert whole OUTPUT-PART to upper case
+    else
+      convert only first letter in OUTPUT-PART to upper case
+  else
+    convert whole OUTPUT-PART to upper case
+</pre>
+== Using a built-in transliteration map ==
 To select any of the built-in maps, press the [[Image:TranslitMapsButton.png]] button. A list of available transliteration maps pops up:
@@ Line 29: / Line 50: @@
 Click on the desired transliteration map. As an example, let us click on the French (to English) transliteration map.
-The '''Rules''' window changes immediately to show the French characters and their English equivalent.
+The '''Rules''' window changes immediately to show the French characters and their English equivalents.
-<center>[[Image:TranslitRule2.png]]</center>
+<center>[[Image:TranslitRuleExample.png]]</center>
 You can edit any of the entry in this list, add new entries, or delete any of the entries.
-Note that such editing does not alter the saved version of the map. (The map is edited just for a one-time use. So, if you select the same Translit map again, ReNamer will load the ''original'' version, not the ''edited'' version.) We will see how to edit and save a map [[ReNamer:Rules:Translit#Saving_a_transliteration_map|later]].
+Note that such editing does not alter the saved version of the map. The map is edited just for a one-time use. If you select the same Translit map again, ReNamer will load the ''original'' version, not the ''edited'' version. You will see how to [[#Saving_a_transliteration_map|alter a transliteration map]] in a section below.
-Next, select the rule's parameters as shown below::
-<br>
+Next, select the rule's parameters as shown below:
-{| class="prettytable"
+{| class="wikitable"
 |-
-| <center>'''Parameter'''</center>
+! Parameter
-| <center>'''Details'''</center>
+! Details
 |-
 | forward
 | This is transliteration from-left-to-right direction, as defined in the map.
 |-
-| Backward
+| backward
 | This is transliteration from-right-to-left direction, as defined in the map.
 |-
-| Skip extension
+| skip extension
-| If this check box is unselected, the extension will be included in the rule.
+| If this check box is selected, the extension will be ignored by the rule.
 |}
 Finally, press the [[Image:AddRuleButton.png]] button to add the rule to the stack.
-=== Making your own transliteration map  ===
+== Making your own transliteration map ==
-Click in the '''Translit Alphabet '''window, and start entering the equivalent characters (one transliteration per line).
+Click in the '''Translit Alphabet''' window, and start entering your custom alphabet.
-For example,
+Transliteration alphabet consists of two equivalence parts (or a couple), which are entered one per line and two parts separated with "=" (equal sign). Alphabet should not contain spaces and should have case discarded ([[ReNamer:Rules:Translit#Automatic_case_conversion|case is adjusted automatically]]). Also, make sure to put couples which contain greater number of characters at the top, so they will get processed first and will not get processed partially by shorter representations. Below is a simple example:
-'''ü=ue'''
+{| align="center"
+|
-'''ö=oe'''
+<pre>
+щ=sh
-'''ß=ss'''
+ю=yu
+я=ya
+ь='
+э=e
+</pre>
+|}
 After entering all such transliterations, press the [[Image:AddRuleButton.png]] button to add the rule to the rule-stack.
 Note that this rule is not saved yet (it was just composed for a one-time use). The following topic shows how to save a map.
-=== Saving a transliteration map  ===
+== Saving a transliteration map ==
 To save a newly composed Transliteration rule,
@@ Line 84: / Line 108: @@
 #Enter a new name for the map and press '''OK'''. The new map is saved.
-The process to save an edited Transliteration map is similar. The only difference is that the '''Save Translit '''window (see above) shows the current map's name. You can press '''OK''' to save the changes you just made, or enter a new name to create a edited version of the current map.
+The process of saving an edited Transliteration map is similar. The only difference is that the '''Save Translit '''window (see above) shows the current map's name. You can press '''OK''' to save the changes you've just made, or enter a new name to create a new translit map for the edited version of the current map.
 The new map's name is added to the map list.
 From now on, the new map will also be available as "standard".
+== Unicode character forms ==
+Have you encounter a case where some characters don't get converted, despite having a visually identical character defined in the Translit alphabet?
+Unicode characters can be defined using exact character codes or using [https://en.wikipedia.org/wiki/Combining_character combining characters]. The displayed characters will look identical, but their binary content is completely different. The conversion process between these forms is covered by the [https://unicode.org/reports/tr15/ Unicode Normalization] standard.
+Alphabets in the Translit rule are normally defined using exact character codes, so the combining characters won't get affected. You can put a piece of text through a ''Unicode analyzer'' to see exactly how each character is defined and to identify the use of combining characters.
+To handle all possible forms of the same visual character in Translit alphabets, one could define all possible forms in an alphabet or one can simply strip away those combining characters, which can be accomplished by using the "Strip unicode marks" option found in the [[ReNamer:Rules:CleanUp|Clean Up rule]].
+[[Category:ReNamer]]

ReNamer:Rules:Translit: Difference between revisions

Latest revision as of 10:43, 6 January 2023

Contents

Transliteration maps

Automatic case conversion

Using a built-in transliteration map

Making your own transliteration map

Saving a transliteration map

Unicode character forms

Navigation menu

ReNamer:Rules:Translit: Difference between revisions

Latest revision as of 10:43, 6 January 2023

Transliteration maps

Automatic case conversion

Using a built-in transliteration map

Making your own transliteration map

Saving a transliteration map

Unicode character forms

Navigation menu

Search