machine-translation

Statistical Machine Translation from Hindi to English using MOSES

点点圈 提交于 2019-12-07 18:38:48
问题 I need to create a Hindi to English translation system using MOSES. I have got a parallel corpora containing about 10000 Hindi sentences and corresponding English translations. I followed the method described in the Baseline system creation page. But, just in the first stage, when I wanted to tokenise my Hindi corpus and tried to execute ~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l hi < ~/corpus/training/hi-en.hi> ~/corpus/hi-en.tok.hi , the tokeniser gave me the following output:

How to get phrase tables from word alignments?

雨燕双飞 提交于 2019-12-06 14:38:25
The output of my word alignment file looks as such: I wish to say with regard to the initiative of the Portuguese Presidency that we support the spirit and the political intention behind it . In bezug auf die Initiative der portugiesischen Präsidentschaft möchte ich zum Ausdruck bringen , daß wir den Geist und die politische Absicht , die dahinter stehen , unterstützen . 0-0 5-1 5-2 2-3 8-4 7-5 11-6 12-7 1-8 0-9 9-10 3-11 10-12 13-13 13-14 14-15 16-16 17-17 18-18 16-19 20-20 21-21 19-22 19-23 22-24 22-25 23-26 15-27 24-28 It may not be an ideal initiative in terms of its structure but we

TensorFlow: nr. of epochs vs. nr. of training steps

倾然丶 夕夏残阳落幕 提交于 2019-12-06 13:14:10
问题 I have recently experimented with Google's seq2seq to set up a small NMT-system. I managed to get everything working, but I am still wondering about the exact difference between the number of epochs and the number of training steps of a model. If I am not mistaken, one epoch consists of multiple training steps and has passed once your whole training data has been processed once. I do not understand, however, the difference between the two when I look at the documentation in Google's own

Python: Goslate translation request returns “503: Service Unavailable” [closed]

落花浮王杯 提交于 2019-12-06 02:31:15
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . A few months ago, I used Python's goslate package to translate a bunch of French text to English. When I tried to do so this morning, though, the service returned an error: import goslate gs = goslate.Goslate() print gs.translate('hello world', 'de') Traceback (most recent call last): File "<stdin>", line 1, in

Word-level Seq2Seq with Keras

僤鯓⒐⒋嵵緔 提交于 2019-12-06 02:29:47
问题 I was following the Keras Seq2Seq tutorial, and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions. If you follow the character-level model, the input data is of 3 dims: #sequences , #max_seq_len , #num_char since each character is one-hot encoded. When I plot the summary for the model as used in the

Automatic Translation tool for Android [closed]

亡梦爱人 提交于 2019-12-05 17:20:00
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . Do you know if there is any free automatic localization tool currently out of the market ? It would be to translate my XML files from my android project The ones i have found all rely on Google translate API. Since this API is now paying(Since December 2011) all those tools are now obsoletes. The ones i have

Word-level Seq2Seq with Keras

左心房为你撑大大i 提交于 2019-12-04 11:19:36
I was following the Keras Seq2Seq tutorial , and wit works fine. However, this is a character-level model, and I would like to adopt it to a word-level model. The authors even include a paragraph with require changes but all my current attempts result in an error regarding wring dimensions. If you follow the character-level model, the input data is of 3 dims: #sequences , #max_seq_len , #num_char since each character is one-hot encoded. When I plot the summary for the model as used in the tutorial, I get: Layer (type) Output Shape Param # Connected to ==========================================

How to save Python NLTK alignment models for later use?

匆匆过客 提交于 2019-12-03 16:21:30
问题 In Python, I'm using NLTK's alignment module to create word alignments between parallel texts. Aligning bitexts can be a time-consuming process, especially when done over considerable corpora. It would be nice to do alignments in batch one day and use those alignments later on. from nltk import IBMModel1 as ibm biverses = [list of AlignedSent objects] model = ibm(biverses, 20) with open(path + "eng-taq_model.txt", 'w') as f: f.write(model.train(biverses, 20)) // makes empty file Once I create

Phrase extraction algorithm for statistical machine translation

女生的网名这么多〃 提交于 2019-12-03 09:53:53
问题 This question was migrated from Code Review Stack Exchange because it can be answered on Stack Overflow. Migrated 5 years ago . I have written the following code with the phrase extraction algorithm for SMT. GitHub # -*- coding: utf-8 -*- def phrase_extraction(srctext, trgtext, alignment): """ Phrase extraction algorithm. """ def extract(f_start, f_end, e_start, e_end): phrases = set() # return { } if f end == 0 if f_end == 0: return # for all (e,f) ∈ A do for e,f in alignment: # return { }

How to save Python NLTK alignment models for later use?

不问归期 提交于 2019-12-03 05:36:29
In Python, I'm using NLTK's alignment module to create word alignments between parallel texts. Aligning bitexts can be a time-consuming process, especially when done over considerable corpora. It would be nice to do alignments in batch one day and use those alignments later on. from nltk import IBMModel1 as ibm biverses = [list of AlignedSent objects] model = ibm(biverses, 20) with open(path + "eng-taq_model.txt", 'w') as f: f.write(model.train(biverses, 20)) // makes empty file Once I create a model, how can I (1) save it to disk and (2) reuse it later? alvas The immediate answer is to pickle