As you probably know, implementing speech-to-text is pretty easy with the Android API. All you have to do is just call up the API's intent and it will return text for you. My case is a bit different, I have a prerecorded 3GPP sound file that I've recorded from the user and is saved on the SD card. I want to know if it's possible to transcribe that into text like any other speech recognition. Does the speech-to-text API allow for uploading you're own sound files to be processed? Or is this impossible?
The API does not allow it, but see this blog post and its comments for a potential workaround. Also make sure that your file contains high quality audio (at least 16 bit and 16 kHz) to get a better transcription.
See also:
I got a solution that is working well to have speech to text from a sound file. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.
I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.
Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:
"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."
However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording
I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.
Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.
The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.
It is currently not possible to send your own audio file to google for processing but instead you can use your speaker and microphone in your android device to use your audio file as an input to google voice recognition.
First you must have an audio file which may be in your SD card then use the following steps:
1) create a method by any name you wish
2) within that method first write code for using google speech recognition
3) Following that code write the code for using speaker to play your audio file which will then become as an input to google speech recognition
//code for google voice recognition
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());
intent.putExtra(RecognizerIntent.EXTRA_PROMPT,
getString(R.string.speech_prompt));
try {
startActivityForResult(intent, REQ_CODE_SPEECH_INPUT);
} catch (ActivityNotFoundException a) {
Toast.makeText(getApplicationContext(),
getString(R.string.speech_not_supported),
Toast.LENGTH_SHORT).show();
//code for playing the audio file which you wish to give as an input
MediaPlayer mp = new MediaPlayer();
try {
mp.setDataSource(file); // here file is the location of the audio file you wish to use an input
mp.prepare();
mp.start();
} catch (Exception e) {
e.printStackTrace();
}
For reference see my blog https://sureshkumarask.wordpress.com/2017/03/19/how-to-give-our-own-audio-file-as-an-input-to-any-speech-recognizer/
i have enclosed the link for the java file in my blog.
来源:https://stackoverflow.com/questions/6989981/speech-to-text-from-own-sound-file