Here is an idea:
We have web applications with exposed restful APIs which accepts json. Now how about using google speech APIs to take user voice input convert it to tex
This is called "intent analysis". There are such libraries, for example RASA
For example you input is "show me chinese restaurants". The output would be
{
"text": "show me chinese restaurants",
"intent": "restaurant_search",
"entities": [
{
"start": 8,
"end": 15,
"value": "chinese",
"entity": "cuisine"
}
]
}
Overall it is pretty advanced NLU.
According to the Google Speech API the result set is already returned in JSON:
{
"results": [
{
"alternatives": [
{
"transcript": "how old is the Brooklyn Bridge",
"confidence": 0.98267895
}
]
}
]
}
All you would have to do is use JSON.parse and then select whatever you wanted out of the object to put into your specific json format.
I would suggest reading through the Google Speech Documentation