How do I use a lexicon with SpeechSynthesizer?

问题

I'm performing some text-to-speech and I'd like to specify some special pronunciations in a lexicon file. I have ran MSDN's AddLexicon example verbatim, and it speaks the sentence but it does not use the given lexicon, something appears to be broken.

Here's the provided example:

using System;
using Microsoft.Speech.Synthesis;

namespace SampleSynthesis
{
  class Program
  {
    static void Main(string[] args)
    {

      // Initialize a new instance of the SpeechSynthesizer.
      using (SpeechSynthesizer synth = new SpeechSynthesizer())
      {

        // Configure the audio output. 
        synth.SetOutputToDefaultAudioDevice();

        PromptBuilder builder = new PromptBuilder();
        builder.AppendText("Gimme the whatchamacallit.");

        // Append the lexicon file.
        synth.AddLexicon(new Uri("c:\\test\\whatchamacallit.pls"), "application/pls+xml");

        // Speak the prompt and play back the output file.
        synth.Speak(builder);
      }

      Console.WriteLine();
      Console.WriteLine("Press any key to exit...");
      Console.ReadKey();
    }
  }
}

and lexicon file:

<lexicon version="1.0" 
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon 
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="x-microsoft-ups" xml:lang="en-US">


  <lexeme>
    <grapheme> whatchamacallit </grapheme>
    <phoneme> W S1 AX T CH AX M AX K S2 AA L IH T </phoneme>
  </lexeme>

</lexicon>

The console opens, the text is spoken, but the new pronunciation isn't used. I have of course saved the file to c:\test\whatchamacallit.pls as specified.

I've tried variations of the Uri and file location (e.g. @"C:\Temp\whatchamacallit.pls", @"file:///c:\test\whatchamacallit.pls"), absolute and relative paths, copying it into the build folder, etc.

I ran Process Monitor and the file is not accessed. If it were a directory/file permission problem (which it isn't) I would still see the access denied messages, however I log no reference at all except the occasional one from my text editor. I do see the file accessed when I try File.OpenRead.

Unfortunately there are no error messages when using a garbage Uri.

On further investigation I realized this example is from Microsoft.Speech.Synthesis, whereas I'm using System.Speech.Synthesis over here. However from what I can tell they are identical except for some additional info and examples and both point to the same specification. Could this still be the problem?

I verified the project is set to use the proper .NET Framework 4.

I compared the example from MSDN to examples from the referenced spec, as well as trying those outright but it hasn't helped. Considering the file doesn't seem to be accessed I'm not surprised.

(I am able to use PromptBuilder.AppendTextWithPronunciation just fine but it's a poor alternative for my use case.)

Is the example on MSDN broken? How do I use a lexicon with SpeechSynthesizer?

回答1:

After a lot of research and pitfalls I can assure you that your assumption is just plain wrong. For some reason System.Speech.Synthesis.SpeechSynthesizer.AddLexicon() adds the lexicon to an internal list, but doesn't use it at all. Seems like nobody tried using it before and this bug went unnoticed.

Microsoft.Speech.Synthesis.SpeechSynthesizer.AddLexicon() (which belongs to the Microsoft Speech SDK) on the other hand works as expected (it passes the lexicon on to the COM object which interprets it as advertised).

Please refer to this guide on how to install the SDK: http://msdn.microsoft.com/en-us/library/hh362873%28v=office.14%29.aspx

Notes:

people reported the 64-bit version to cause COM exceptions (because the library does not get installed correctly), I confirmed this on a 64bit Windows 7 machine
- using the x86 version circumvents the problem
be sure to install the runtime before the SDK
be sure to also install a runtime language (as adviced on the linked page) as the SDK does not use the default system speech engine

回答2:

I've been looking into this a little recently on Windows 10.

There are two things I've discovered with System.Speech.Synthesis.

Any Voice you use, must be matched against the language in the Lexicon file. Inside the lexicon you have the language:

<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      alphabet="x-microsoft-ups" xml:lang="en-US">

I find that I can name my Lexicon as "blue.en-US.pls" and make a copy with "blue.en-GB.pls". Inside it will have xml:lang="en-GB"

In the code you'd use:

string langFile = Path.Combine(_appPath, $"blue.{synth.Voice.Culture.IetfLanguageTag}.pls");
synth.AddLexicon(new Uri(langFile), "application/pls+xml");

The other thing I discovered is, it doesn't work with "Microsoft Zira Desktop - English (United States)" at all. I don't know why. This appears to be the default voice on Windows 10.

Access and change your default voice here: %windir%\system32\speech\SpeechUX\SAPI.cpl

Otherwise you should be able to set it via code:

var voices = synth.GetInstalledVoices();
// US: David, Zira. UK: Hazel.
var voice = voices.First(v => v.VoiceInfo.Name.Contains("David"));
synth.SelectVoice(voice.VoiceInfo.Name);

I have David (United States) and Hazel (United Kingdom), and it works fine with either of those. This appears to be directly related to whether the voice token in the registry has the SpLexicon key value. The Microsoft Zira Desktop voice does not have this registry value. While Microsoft David Desktop voice has the following: Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_DAVID_11.0\Attributes\SpLexicon = {0655E396-25D0-11D3-9C26-00C04F8EF87C}

回答3:

You can use System.Speech.Synthesis.SpeechSynthesizer.SpeakSsml() instead of a lexicon.

This code changes pronunciation of "blue" to "yellow" and "dog" to "fish".

SpeechSynthesizer synth = new SpeechSynthesizer();
string text = "This is a blue dog";
Dictionary<string, string> phonemeDictionary = new Dictionary<string, string> { { "blue", "jelow" }, { "dog", "fyʃ" } };
foreach (var element in phonemeDictionary)
{
   text = text.Replace(element.Key, "<phoneme ph=\"" + element.Value + "\">" + element.Key + "</phoneme>");
}
text = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\">" + text + "</speak>";
synth.SpeakSsml(text);

来源：https://stackoverflow.com/questions/11529164/how-do-i-use-a-lexicon-with-speechsynthesizer

标签

text-to-speech