I'm currently trying to hit the google tts url, http://translate.google.com/translate_tts with japanese characters and phrases in python using the requests library.
Here is an example:
http://translate.google.com/translate_tts?tl=ja&q=ひとつ
However, when I try to use the python requests library to download the mp3 that the endpoint returns, the resulting mp3 is blank. I have verified that I can hit this URL in requests using non-unicode characters (via romanji) and have gotten correct responses back.
Here is a part of the code I am using to make the request
langs = {'japanese': 'ja', 'english': 'en'} def get_sound_file_for_text(text, download=False, lang='japanese'): r = StringIO() glang = langs[lang] text = text.replace('*', '') text = text.replace('/', '') text = text.replace('x', '') url = 'http://translate.google.com/translate_tts' if download: result = requests.get(url, params={'tl': glang, 'q': text}) r.write(result.content) r.seek(0) return r else: return url
Also, if I print text
or url
within this snippet, the kana/kanji is rendered correctly in my console.
Edit:
If I attempt to encode the unicode and quote it as such, I still get the same response.
# -*- coding: utf-8 -*- from StringIO import StringIO import urllib import requests __author__ = 'jacob' langs = {'japanese': 'ja', 'english': 'en'} def get_sound_file_for_text(text, download=False, lang='japanese'): r = StringIO() glang = langs[lang] text = text.replace('*', '') text = text.replace('/', '') text = text.replace('x', '') text = urllib.quote(text.encode('utf-8')) url = 'http://translate.google.com/translate_tts?tl=%(glang)s&q=%(text)s' % locals() print url if download: result = requests.get(url) r.write(result.content) r.seek(0) return r else: return url
Which returns this:
http://translate.google.com/translate_tts?tl=ja&q=%E3%81%B2%E3%81%A8%E3%81%A4
Which seems like it should work, but doesn't.
Edit 2:
If I attempt to use urlllb/urllib2, I get a 403 error.
Edit 3:
So, it seems that this problem/behavior is simply limited to this endpoint. If I try the following URL, a different endpoint.
http://www.kanjidamage.com/kanji/13-un-%E4%B8%8D
From within requests and my browser, I get the same response (they match). If I even try ascii characters to the server, like this url.
http://translate.google.com/translate_tts?tl=ja&q=sayonara
I get the same response as well (they match again). But if I attempt to send unicode characters to this URL, I get a correct audio file on my browser, but not from requests, which sends an audio file, but with no sound.
http://translate.google.com/translate_tts?tl=ja&q=%E3%81%B2%E3%81%A8%E3%81%A4
So, it seems like this behavior is limited to the Google TTL URL?