Python Google Translate API error : How to translate a large amount of data

只谈情不闲聊 提交于 2020-04-13 05:48:47

问题


My problem

I would like to use a kind of data-augmentation method for NLP consisting of back-translating dataset.

Basically, I have a large dataset (SNLI), consisting of 1 100 000 english sentences. What I need to do is : translate these sentences in a language, and translate it back to English.

I may have to do this for several language. So I have a lot of translations to do.

I need a free solution.


What I did so far

I tried several python module for translation, but due to recent changes in Google Translate API, most of them do not work. googletrans seems to work if we apply this solution.

However, it is not working for big dataset. There is a limit of 15K characters by Google (as pointed out by this, this and this). The first link show a supposed work-around.


Where I am blocked

Even if I apply the work-around (initializing the Translator every iteration), it is not working, and I got the following error :

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I tried using proxies and others Google translate URLs :

URLS = ['translate.google.com', 'translate.google.co.kr', 'translate.google.ac', 'translate.google.ad', 'translate.google.ae', ...]

proxies = {    'http': '1.243.64.63:48730',   'https': '59.11.98.253:42645', }

t = Translator(service_urls=URLS, proxies=proxies)

But it's not changing anything.


Note

My problem might come from the fact that I am using multi-threading : 100 workers for translating the whole dataset. If they work in parallel, maybe they use more than 15k characters together.

But I should use multi-threading. If I don't, it will take several weeks to translate the whole dataset...


My question

How do I fix this error so I can translate all sentences ?

If it's not possible, is there any free alternative, to get machine translation in Python (not mandatory to use Google Translate), for such a big dataset ?


回答1:


One million characters is pretty much text to be translated.

Currently, the Google Cloud Translation V3 offers a free tier quota that you may want to use (1-500,000 characters free per month). Since it doesn't seem to be enough for your use case, you probably need to create more than one billing accounts or wait for a month to translate more text.

Check this link to know how you can perform a text translation with python.



来源:https://stackoverflow.com/questions/53075240/python-google-translate-api-error-how-to-translate-a-large-amount-of-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!