问题
I want to transliterate Hindi text into English in following way
Hindi - "आपका स्वागत है"
to
English - "aapka swagat hain"
I don't want to use Google's Translate API or any other translate API. If I use them it will end up giving me the translated version of my hindi text which is "Welcome".
Is there any Transliterate library which I can use in my Android code?
I've heard of ICU but no luck in finding the procedure to use it in my code.
回答1:
One potential way to solve the problem is to break it into two solvable ones and marry the two.
There are Hindi readers which can read Hindi devanagari script.
There are also dictating engines that phonetically transcribe in English.
E.g. when someone leaves a message on my Vonage phone line in Gujarati, it records the audio, generates English text and emails me the wav file and the text. Mind you, when reading the text message it can occasionally be hilariously funny because Vonage assumes it is supposed to be in English, I am expecting the message to be in English, but after reading the message I realize it is Gujarati.
Google 'hindi reader for android' and 'phonetic transcription' for more info. If a Hindi reader could output a wav file which can be used as input to the transcription piece, then it may be a solution to your problem.
回答2:
One possible and probably simple solution that i have found is that mapping of characters. So, i need to work on transliteration of the hindi words to english character, exact same situation as yours, but you can also work with other transliterations by using this approach. you just need to change the unicodes for the particular language.
Theory :- So, Converting Hindi or any other language's characters into English characters and vice versa, you first need to know about the range of the Unicodes for every characters of your language that you are transforming into English Characters.
For Example : To Convert Hindi Unicodes into English ASCII values, the range of the unicode is 0900-097F and this range varies as language changes. So, by using this unicodes you can map "How the particular unicode(Hindi Character) sounds", like ह will map to h in English Alphabet. So, this was the theory part.
Practical Approach : I need to create an app to get the Hindi voice input from user and convert it into English Alphabets. So, i have used STT(Speech to Text) Library and got hindi unicodes and from that hindi unicodes i am mapping that into English Characters.
Code :-
MainActivity.java
package android.example.com.conversion;
import android.content.ActivityNotFoundException;
import android.content.Intent;
import android.os.Bundle;
import android.speech.RecognizerIntent;
import android.support.v7.app.AppCompatActivity;
import android.support.v7.widget.AppCompatButton;
import android.view.View;
import android.widget.TextView;
import android.widget.Toast;
import java.util.ArrayList;
public class MainActivity extends AppCompatActivity implements View.OnClickListener {
// Record Button
AppCompatButton RecordBtn;
// TextView to show Original and recognized Text
TextView Original,result;
// Request Code for STT
private final int SST_REQUEST_CODE = 101;
// Conversion Table Object...
ConversionTable conversionTable;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Original = findViewById(R.id.Original_Text);
RecordBtn = findViewById(R.id.RecordBtn);
result = findViewById(R.id.Recognized_Text);
RecordBtn.setOnClickListener(this);
}
@Override
public void onClick(View v) {
switch (v.getId()) {
case R.id.RecordBtn:
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
// Use Off line Recognition Engine only...
intent.putExtra(RecognizerIntent.EXTRA_PREFER_OFFLINE, false);
// Use Hindi Speech Recognition Model...
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "hi-IN");
try {
startActivityForResult(intent, SST_REQUEST_CODE);
} catch (ActivityNotFoundException a) {
Toast.makeText(getApplicationContext(),
getString(R.string.error),
Toast.LENGTH_SHORT).show();
}
break;
}
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
switch (requestCode) {
case SST_REQUEST_CODE:
if (resultCode == RESULT_OK && null != data) {
ArrayList<String> getResult = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
Original.setText(getResult.get(0));
conversionTable = new ConversionTable();
String Transformed_String = conversionTable.transform(getResult.get(0));
result.setText(Transformed_String);
}
break;
}
}
}
My ConversationTable.java class: this will map the values into English Alphabets.
package android.example.com.conversion;
import android.util.Log;
import java.util.ArrayList;
import java.util.Hashtable;
public class ConversionTable
{
private String TAG = "Conversation Table";
private Hashtable<String,String> unicode;
private void populateHashTable()
{
unicode = new Hashtable<>();
// unicode
unicode.put("\u0901","rha"); // anunAsika - cchandra bindu, using ~ to // *
unicode.put("\u0902","n"); // anusvara
unicode.put("\u0903","ah"); // visarga
unicode.put("\u0940","ee");
unicode.put("\u0941","u");
unicode.put("\u0942","oo");
unicode.put("\u0943","rhi");
unicode.put("\u0944","rhee"); // * = Doubtful Case
unicode.put("\u0945","e");
unicode.put("\u0946","e");
unicode.put("\u0947","e");
unicode.put("\u0948","ai");
unicode.put("\u0949","o");
unicode.put("\u094a","o");
unicode.put("\u094b","o");
unicode.put("\u094c","au");
unicode.put("\u094d","");
unicode.put("\u0950","om");
unicode.put("\u0958","k");
unicode.put("\u0959","kh");
unicode.put("\u095a","gh");
unicode.put("\u095b","z");
unicode.put("\u095c","dh"); // *
unicode.put("\u095d","rh");
unicode.put("\u095e","f");
unicode.put("\u095f","y");
unicode.put("\u0960","ri");
unicode.put("\u0961","lri");
unicode.put("\u0962","lr"); // *
unicode.put("\u0963","lree"); // *
unicode.put("\u093E","aa");
unicode.put("\u093F","i");
// Vowels and Consonants...
unicode.put("\u0905","a");
unicode.put("\u0906","a");
unicode.put("\u0907","i");
unicode.put("\u0908","ee");
unicode.put("\u0909","u");
unicode.put("\u090a","oo");
unicode.put("\u090b","ri");
unicode.put("\u090c","lri"); // *
unicode.put("\u090d","e"); // *
unicode.put("\u090e","e"); // *
unicode.put("\u090f","e");
unicode.put("\u0910","ai");
unicode.put("\u0911","o");
unicode.put("\u0912","o");
unicode.put("\u0913","o");
unicode.put("\u0914","au");
unicode.put("\u0915","k");
unicode.put("\u0916","kh");
unicode.put("\u0917","g");
unicode.put("\u0918","gh");
unicode.put("\u0919","ng");
unicode.put("\u091a","ch");
unicode.put("\u091b","chh");
unicode.put("\u091c","j");
unicode.put("\u091d","jh");
unicode.put("\u091e","ny");
unicode.put("\u091f","t"); // Ta as in Tom
unicode.put("\u0920","th");
unicode.put("\u0921","d"); // Da as in David
unicode.put("\u0922","dh");
unicode.put("\u0923","n");
unicode.put("\u0924","t"); // ta as in tamasha
unicode.put("\u0925","th"); // tha as in thanks
unicode.put("\u0926","d"); // da as in darvaaza
unicode.put("\u0927","dh"); // dha as in dhanusha
unicode.put("\u0928","n");
unicode.put("\u0929","nn");
unicode.put("\u092a","p");
unicode.put("\u092b","ph");
unicode.put("\u092c","b");
unicode.put("\u092d","bh");
unicode.put("\u092e","m");
unicode.put("\u092f","y");
unicode.put("\u0930","r");
unicode.put("\u0931","rr");
unicode.put("\u0932","l");
unicode.put("\u0933","ll"); // the Marathi and Vedic 'L'
unicode.put("\u0934","lll"); // the Marathi and Vedic 'L'
unicode.put("\u0935","v");
unicode.put("\u0936","sh");
unicode.put("\u0937","ss");
unicode.put("\u0938","s");
unicode.put("\u0939","h");
// represent it\
// unicode.put("\u093c","'"); // avagraha using "'"
// unicode.put("\u093d","'"); // avagraha using "'"
unicode.put("\u0969","3"); // 3 equals to pluta
unicode.put("\u014F","Z");// Z equals to upadhamaniya
unicode.put("\u0CF1","V");// V equals to jihvamuliya....but what character have u settled for jihvamuliya
/* unicode.put("\u0950","Ω"); // aum
unicode.put("\u0958","κ"); // Urdu qaif
unicode.put("\u0959","Κ"); //Urdu qhe
unicode.put("\u095A","γ"); // Urdu gain
unicode.put("\u095B","ζ"); //Urdu zal, ze, zoe
unicode.put("\u095E","φ"); // Urdu f
unicode.put("\u095C","δ"); // Hindi 'dh' as in padh
unicode.put("\u095D","Δ"); // hindi dhh*/
unicode.put("\u0926\u093C","τ"); // Urdu dwad
unicode.put("\u0924\u093C","θ"); // Urdu toe
unicode.put("\u0938\u093C","σ"); // Urdu swad, se
}
ConversionTable()
{
populateHashTable();
}
public String transform(String s1)
{
StringBuilder transformed = new StringBuilder();
int strLen = s1.length();
ArrayList<String> shabda = new ArrayList<>();
String lastEntry = "";
for (int i = 0; i < strLen; i++)
{
char c = s1.charAt(i);
String varna = String.valueOf(c);
Log.d(TAG, "transform: " + varna + "\n");
String halant = "0x0951";
if (VowelUtil.isConsonant(varna))
{
Log.d(TAG, "transform: " + unicode.get(varna));
shabda.add(unicode.get(varna));
shabda.add(halant); //halant
lastEntry = halant;
}
else if (VowelUtil.isVowel(varna))
{
Log.d(TAG, "transform: " + "Vowel Detected...");
if (halant.equals(lastEntry))
{
if (varna.equals("a"))
{
shabda.set(shabda.size() - 1,"");
}
else
{
shabda.set(shabda.size() - 1, unicode.get(varna));
}
}
else
{
shabda.add(unicode.get(varna));
}
lastEntry = unicode.get(varna);
} // end of else if is-Vowel
else if (unicode.containsKey(varna))
{
shabda.add(unicode.get(varna));
lastEntry = unicode.get(varna);
}
else
{
shabda.add(varna);
lastEntry = varna;
}
} // end of for
for (String string: shabda)
{
transformed.append(string);
}
//Discard the shabda array
shabda = null;
return transformed.toString(); // return transformed;
}
}
My VowelUtil.class : This will check for vowels and consonants, although it's not required to check for that.
package android.example.com.conversion;
public class VowelUtil {
protected static boolean isVowel(String strVowel) {
// Log.logInfo("came in is_Vowel: Checking whether string is a Vowel");
return strVowel.equals("a") || strVowel.equals("aa") || strVowel.equals("i") || strVowel.equals("ee") ||
strVowel.equals("u") || strVowel.equals("oo") || strVowel.equals("ri") || strVowel.equals("lri") || strVowel.equals("e")
|| strVowel.equals("ai") || strVowel.equals("o") || strVowel.equals("au") || strVowel.equals("om");
}
protected static boolean isConsonant(String strConsonant) {
// Log.logInfo("came in is_consonant: Checking whether string is a
// consonant");
return strConsonant.equals("k") || strConsonant.equals("kh") || strConsonant.equals("g")
|| strConsonant.equals("gh") || strConsonant.equals("ng") || strConsonant.equals("ch") || strConsonant.equals("chh") || strConsonant.equals("j")
|| strConsonant.equals("jh") || strConsonant.equals("ny") || strConsonant.equals("t") || strConsonant.equals("th") ||
strConsonant.equals("d") || strConsonant.equals("dh") || strConsonant.equals("n") || strConsonant.equals("nn") || strConsonant.equals("p") ||
strConsonant.equals("ph") || strConsonant.equals("b") || strConsonant.equals("bh") || strConsonant.equals("m") || strConsonant.equals("y") ||
strConsonant.equals("r") || strConsonant.equals("rr") || strConsonant.equals("l") || strConsonant.equals("ll") || strConsonant.equals("lll") ||
strConsonant.equals("v") || strConsonant.equals("sh") || strConsonant.equals("ss") || strConsonant.equals("s") || strConsonant.equals("h") ||
strConsonant.equals("3") || strConsonant.equals("z") || strConsonant.equals("v") || strConsonant.equals("Ω") ||
strConsonant.equals("κ") || strConsonant.equals("K") || strConsonant.equals("γ") || strConsonant.equals("ζ") || strConsonant.equals("φ") ||
strConsonant.equals("δ") || strConsonant.equals("Δ") || strConsonant.equals("τ") || strConsonant.equals("θ") || strConsonant.equals("σ");
}
}
This is the simplest method by which you can do transliteration and this is general method, which you can apply to do any language translateration to any other language transliteration.You just need to supply perfect unicodes to map to other language's unicodes.
Results :
来源:https://stackoverflow.com/questions/19511512/transliteration-from-hindi-to-english-on-android-without-using-google-api