While trying to determine a solution to How to use Web Speech API at chromium? found that
var voices = window.speechSynthesis.getVoices();
There are several possible workarounds that have found which provide the ability to create audio from text; two of which require requesting an external resource, the other uses meSpeak.js by @masswerk.
Using approach described at Download the Audio Pronunciation of Words from Google, which suffers from not being able to pre-determine which words actually exist as a file at the resource without writing a shell script or performing a HEAD
request to check if a network error occurs. For example, the word "do" is not available at the resource used below.
window.addEventListener("load", () => {
const textarea = document.querySelector("textarea");
const audio = document.createElement("audio");
const mimecodec = "audio/webm; codecs=opus";
audio.controls = "controls";
document.body.appendChild(audio);
audio.addEventListener("canplay", e => {
audio.play();
});
let words = textarea.value.trim().match(/\w+/g);
const url = "https://ssl.gstatic.com/dictionary/static/sounds/de/0/";
const mediatype = ".mp3";
Promise.all(
words.map(word =>
fetch(`https://query.yahooapis.com/v1/public/yql?q=select * from data.uri where url="${url}${word}${mediatype}"&format=json&callback=`)
.then(response => response.json())
.then(({query: {results: {url}}}) =>
fetch(url).then(response => response.blob())
.then(blob => blob)
)
)
)
.then(blobs => {
// const a = document.createElement("a");
audio.src = URL.createObjectURL(new Blob(blobs, {
type: mimecodec
}));
// a.download = words.join("-") + ".webm";
// a.click()
})
.catch(err => console.log(err));
});
<textarea>what it does my ninja?</textarea>
Resources at Wikimedia Commons Category:Public domain are not necessary served from same directory, see How to retrieve Wiktionary word content?, wikionary API - meaning of words.
If the precise location of the resource is known, the audio can be requested, though the URL may include prefixes other than the word itself.
fetch("https://upload.wikimedia.org/wikipedia/commons/c/c5/En-uk-hello-1.ogg")
.then(response => response.blob())
.then(blob => new Audio(URL.createObjectURL(blob)).play());
Not entirely sure how to use the Wikipedia API, How to get Wikipedia content using Wikipedia's API?, Is there a clean wikipedia API just for retrieve content summary? to get only the audio file. The JSON
response would need to be parsed for text ending in .ogg
, then a second request would need to be made for the resource itself.
fetch("https://en.wiktionary.org/w/api.php?action=parse&format=json&prop=text&callback=?&page=hello")
.then(response => response.text())
.then(data => {
new Audio(location.protocol + data.match(/\/\/upload\.wikimedia\.org\/wikipedia\/commons\/[\d-/]+[\w-]+\.ogg/).pop()).play()
})
// "//upload.wikimedia.org/wikipedia/commons/5/52/En-us-hello.ogg\"
which logs
Fetch API cannot load https://en.wiktionary.org/w/api.php?action=parse&format=json&prop=text&callback=?&page=hello. No 'Access-Control-Allow-Origin' header is present on the requested resource
when not requested from same origin. We would need to try to use YQL
again, though not certain how to formulate the query to avoid errors.
The third approach uses a slightly modified version of meSpeak.js
to generate the audio without making an external request. The modification was to create a proper callback for .loadConfig()
method
fetch("https://gist.githubusercontent.com/guest271314/f48ee0658bc9b948766c67126ba9104c/raw/958dd72d317a6087df6b7297d4fee91173e0844d/mespeak.js")
.then(response => response.text())
.then(text => {
const script = document.createElement("script");
script.textContent = text;
document.body.appendChild(script);
return Promise.all([
new Promise(resolve => {
meSpeak.loadConfig("https://gist.githubusercontent.com/guest271314/8421b50dfa0e5e7e5012da132567776a/raw/501fece4fd1fbb4e73f3f0dc133b64be86dae068/mespeak_config.json", resolve)
}),
new Promise(resolve => {
meSpeak.loadVoice("https://gist.githubusercontent.com/guest271314/fa0650d0e0159ac96b21beaf60766bcc/raw/82414d646a7a7ef11bb04ddffe4091f78ef121d3/en.json", resolve)
})
])
})
.then(() => {
// takes approximately 14 seconds to get here
console.log(meSpeak.isConfigLoaded());
meSpeak.speak("what it do my ninja", {
amplitude: 100,
pitch: 5,
speed: 150,
wordgap: 1,
variant: "m7"
});
})
.catch(err => console.log(err));
one caveat of the above approach being that it takes approximately 14 and a half seconds for the three files to load before the audio is played back. However, avoids external requests.
It would be a positive to either or both 1) create a FOSS, developer maintained database or directory of sounds for both common and uncommon words; 2) perform further development of meSpeak.js
to reduce load time of the three necessary files; and use Promise
based approaches to provide notifications of the progress of of the loading of the files and readiness of the application.
In this users' estimation, it would be a useful resource if developers themselves created and contributed to an online database of files which responded with an audio file of the specific word. Not entirely sure if github is the appropriate venue to host audio files? Will have to consider the possible options if interest in such a project is shown.