Presently the implementation of the Web Speech API Specification by Chromium and Firefox does not support parsing Speech Synthesis Markup Language (SSML) when SSML is set at text
property of SpeechSyntheisUtterance
instance and passed to window.speechSynthesis.speak()
call; see SSML parsing implementation at browsers; 5.2.3 SpeechSynthesisUtterance Attributes; How to set options of commands called by browser?.
Chromium source code for the unix socket connect to speech-dispatcher
connection appears to be at /src/chrome/browser/speech/tts_linux.cc
{
// spd_open has memory leaks which are hard to suppress.
// http://crbug.com/317360
ANNOTATE_SCOPED_MEMORY_LEAK;
conn_ = libspeechd_loader_.spd_open(
"chrome", "extension_api", NULL, SPD_MODE_THREADED);
}
which appears to be reflected at /run/user/1000/speech-dispatcher/log
speechd: Updating client specific settings "linux:chrome:extension_api"
Chromium source code at /src/third_party/speech-dispatcher/libspeechd.h
appears to define the SSML_DATA_MODE
described at speech-dispatcher
documentation
The speech-dispatcher
documentation states that the user configuration file can be used to set parameters for specific clients
4.1.6 Parameter Settings Commands
The following parameter setting commands are available. For configuration and history clients there are also functions for setting the value for some other connection and for all connections. They are listed separately below.
C API function:
int spd_set_data_mode(SPDConnection *connection, SPDDataMode mode)
Set Speech Dispatcher data mode. Currently, plain text and SSML are supported. SSML is especially useful if you want to use index marks or include changes of voice parameters in the text.mode is the requested data mode:
SPD_DATA_TEXT
orSPD_DATA_SSML
.
SPD_DATA_SSML
is not set to on
at the establishment of the SSIP connection from Chromium to speech-dispatcher
, for example as demonstrated by @xmash at How to use Index Marks in "speech-dispatcher"?
spd_execute_command_wo_mutex( m_connection, "SET SELF SSML_MODE on" );
nor is it possible to pass options to the default speech synthesis module, m
for espeak
or -x
for spd-say
.
With LogLevel
set to 4
or 5
/run/user/1000/speech-dispatcher/log
lists the communication between Chromium (client) and speech-dispatcher
speechd: Module set parameters
(server) which can also be viewed at stdout
using the PID
within /run/user/1000/speech-dispatcher/pid
and strace
, see Is there a way to intercept interprocess communication in Unix/Linux?
$ sudo strace -ewrite -p $PID
write(22, "216 OK OUTPUT MODULE SET\r\n", 26) = 26
There does not appear to be an option to set SSML parsing to on from either speechd.conf
or espeak.conf
following running
$ spd-conf -u
While attempting to parse SSML using JavaScript at SpeechSynthesisSSMLParser encountered a bug at Chromium when trying to parse <break>
element, where it is not clear whether spd-say
is called or the default output module, e.g., espeak
is run when window.speechSynthesis.speak()
is called by the browser; see /src/out/Debug/gen/library_loaders/libspeechd.h.
Created an approach to use php
to call espeak
using shell_exec()
which returns the expected result
// JavaScript
async function SSMLStream({ssml="", options=""}) {
const fd = new FormData();
fd.append("ssml", ssml);
fd.append("options", options);
const request = await fetch("speak.php", {method:"POST", body:fd});
const response = await request.arrayBuffer();
return response;
}
let ssml = `<speak version="1.0" xml:lang="en-US">
Here are <say-as interpret-as="characters">SSML</say-as> samples.
Hello universe, how are you today?
Try a date: <say-as interpret-as="date" format="dmy" detail="1">10-9-1960</say-as>
This is a <break time="2500ms" /> 2.5 second pause.
This is a <break /> sentence break</prosody> <break />
<voice name="us-en+f3" rate="x-slow" pitch="0.25">espeak using</voice>
PHP and <voice name="en-us+f2"> <sub alias="JavaScript">JS</sub></voice>
</speak>`;
SSMLStream({ssml, options:"-v en-us+f1"})
.then(async(data) => {
let context = new AudioContext();
let source = context.createBufferSource();
source.buffer = await context.decodeAudioData(data);
source.connect(context.destination);
source.start()
})
// PHP
<?php
if(isset($_POST["ssml"])) {
header("Content-Type: audio/x-wav");
$options = $_POST["options"];
echo shell_exec("espeak -m --stdout " . $options . " '" . $_POST["ssml"] . "'");
};
Requirement:
Parse the SSML set at text property of SpeechSynthesisUtterance
using the existing capabilities of the native program called to convert text to speech by speech-dispatcher
output module using default browser capabilities.
Questions:
1) How to programmatically listen for the the PID
when speech-dispatcher --spawn-communication-method unix_socket --socket-path /run/user/1000/speech-dispatcher/speechd.sock
is called by Chromium browser, then call spd_execute_command_wo_mutex
or spd_execute_command_wo_mutex
to the speech-dispatcher
server using the established unix socket connection as client (Chromium) with "SET SELF SSML_MODE on"
as second parameter to turn on SSML parsing for all calls to window.speechSynthesis.speak()
at Chromium browser?
2) If 1) is not possible, what needs to be adjusted at Chromium source code to turn on SSML parsing for the unix socket connection, e.g., at tools/generate_library_loader/generate_library_loader.py?
3) If 1) and 2) are not viable options, how to convert the JavaScript and PHP code into C++ code in the format used by Chromium browser; and how to build Chromium with the patch included; for the purpose of exposing a speak
function with accepts parameters which can be passed to a native speech synthesis application where SSML is parsed and the resulting audio output is returned to JavaScript caller as an ArrayBuffer
?
4) If options other than 1), 2) and 3) are available and capable of meeting requirement how can we resolve the inquiry programmatically; without having to start a local server manually at terminal
?
来源:https://stackoverflow.com/questions/48219981/how-to-programmatically-send-a-unix-socket-command-to-a-system-server-autospawne