I'm currently trying to hit the google tts url, http://translate.google.com/translate_tts with japanese characters and phrases in python using the requests library.
Here is an example:
However, when I try to use the python requests library to download the mp3 that the endpoint returns, the resulting mp3 is blank. I have verified that I can hit this URL in requests using non-unicode characters (via romanji) and have gotten correct responses back.
Here is a part of the code I am using to make the request
langs = {'japanese': 'ja', 'english': 'en'} def get_sound_file_for_text(text, download=False, lang='japanese'): r = StringIO() glang = langs[lang] text = text.replace('*', '') text = text.replace('/', '') text = text.replace('x', '') url = 'http://translate.google.com/translate_tts' if download: result = requests.get(url, params={'tl': glang, 'q': text}) r.write(result.content) r.seek(0) return r else: return url
Also, if I print text
or url
within this snippet, the kana/kanji is rendered correctly in my console.
If I attempt to encode the unicode and quote it as such, I still get the same response.
# -*- coding: utf-8 -*- from StringIO import StringIO import urllib import requests __author__ = 'jacob' langs = {'japanese': 'ja', 'english': 'en'} def get_sound_file_for_text(text, download=False, lang='japanese'): r = StringIO() glang = langs[lang] text = text.replace('*', '') text = text.replace('/', '') text = text.replace('x', '') text = urllib.quote(text.encode('utf-8')) url = 'http://translate.google.com/translate_tts?tl=%(glang)s&q=%(text)s' % locals() print url if download: result = requests.get(url) r.write(result.content) r.seek(0) return r else: return url
Which returns this:
Which seems like it should work, but doesn't.
Edit 2:
If I attempt to use urlllb/urllib2, I get a 403 error.
Edit 3:
So, it seems that this problem/behavior is simply limited to this endpoint. If I try the following URL, a different endpoint.
From within requests and my browser, I get the same response (they match). If I even try ascii characters to the server, like this url.
I get the same response as well (they match again). But if I attempt to send unicode characters to this URL, I get a correct audio file on my browser, but not from requests, which sends an audio file, but with no sound.
So, it seems like this behavior is limited to the Google TTL URL?