问题
I am having a dataframe of which one column has a list of strings at each row.
On average, each list has 150 words of about 6 characters each.
Each of the 700 rows of the dataframe is about a document and each string is a word of this document; so basically I have tokenised the words of the document.
I want to detect the language of each of these documents and to do this I firstly try to detect the language of each word of the document.
For this reason I do the following:
from textblob import TextBlob
def lang_detect(document):
lang_count = {}
for word in document:
if len(word) >= 4:
word_textblob = TextBlob(word)
lang_result = word_textblob.detect_language()
response = lang_count.get(lang_result)
if response is None:
lang_count[f"{lang_result}"] = 1
else:
lang_count[f"{lang_result}"] += 1
return lang_count
df_per_doc['languages_count'] = df_per_doc['complete_text'].apply(lambda x: lang_detect(x))
When I do this then I get the following error:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
<ipython-input-42-772df3809bcb> in <module>
25
---> 27 df_per_doc['languages_count'] = df_per_doc['complete_text'].apply(lambda x: lang_detect(x))
28
29
.
.
.
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 429: Too Many Requests
The error is much longer and I have omitted the rest of it at the middle.
Now,I am getting the same error even if I try to do this for only two documents/rows.
Is there any way that I can get a response from textblob
for more words & documents?
回答1:
I had the same issue when I was trying to translate tweets. Since I exceed the rate limit, it started to return HTTP 429 too many requests error.
Therefore, for the others who might want to work on TextBlob, it would be better to check rate limits. Google provides information regarding limits: https://cloud.google.com/translate/quotas?hl=en
If you exceed the rate limits, you have to wait until quotas reset at midnight Pacific Time. It might take 24 hours to become effective again.
On the other hand, you can also introduce a delay between your requests to not bother the API server.
Ex: When you want to translate the TextBlob sentences in the list.
import time
...
for sentence in list_of_sentences:
sentence.translate()
time.sleep(1) #to sleep 1 sec
回答2:
You can try Googletrans.
"Googletrans is a free and unlimited Python library that implemented Google Translate API. This uses the Google Translate Ajax API to make calls to such methods as detect and translate."
Similary to TextBlob, Googletrans has features like language detection and translation. It worked pretty well for me when I was flagging the language and translating a large amount of mails.
(When using TextBlob I've tried time.sleep(1)
but I ended up reaching the API limit...)
来源:https://stackoverflow.com/questions/56189054/textblob-httperror-http-error-429-too-many-requests