Technical Details About Speech Property Support in CLC-4-TTS

There are several technical challenges in implementing CSS speech properties for CLC-4-TTS. As a result, there is only partial support at the moment. Partial support is better than no support, so I have gone ahead with this current release as it does present users with several useful features, including the say-instead property which can be used to ensure that things are spoken properly.

Some of the known issues are:

  1. Only embedded stylesheets work; external stylesheets do not.
  2. Pitch Range has no effect when SAPI 5 is the selected TTS engine.
  3. Sometimes the speech properties that use values seem slightly off when SAPI 5 is the selected TTS engine eventhough speech properties that use percentages and keywords seem fine.
  4. Pitch, Pitch Range, Rate, Volume do not always work when FreeTTS is the selected TTS engine.
  5. Not all speech properties are supported.

All of these issues are a direct result of the limitations of the components that I am working with. I am trying to find workarounds for these issues to have better support, but ultimately, these are fixes which should come from Mozilla and the speech engine developers.

Limitations of Firefox

Firefox does not parse CSS2/CSS3 speech properties. This is a known Firefox bug; the Bugzilla report can be found here. As a result, I cannot simply do a getPropertyValue to find the CSS speech properties; instead, I have to rely on manually parsing the stylesheet. This is slower and more error prone. Also, because I am doing all the parsing myself, I have not had enough time to implement all of the speech properties. If a speech property isn't supported, it means I haven't added it into my parsing system yet. Eventually, all of this manual parsing that I am doing should be replaced with a simple call to getPropertyValue once Firefox supports it; however, with the target milestone in the Bugzilla report set to "future," we may have quite a wait.

Limitations of SAPI 5

SAPI 5 does not have a pitch range property. As a result, I cannot support pitch range in SAPI 5. Also, properties in SAPI 5 are set on a scale of -10 to 10; one cannot set an actual value. So I cannot specify a kHZ rating for the pitch; all I can do is specify if the pitch should be lower or higher on that -10 to 10 scale. To support the speech properties where an actual value is given, I am approximating that value on the -10 to 10 scale, and that approximation is determined by me listening to it and trying to figure out what sounds the closest to being correct. If you feel that certain properties sound a tad off when values are used, it is because those are done with approximations. This is a design limitation of the SAPI 5 engine; SAPI 5.3 is supposed to be much better, although it is not publically available yet.

Limitations of Java FreeTTS

Java FreeTTS does not actually support JSML markup. This lack of JSML support is a known limitation of Java FreeTTS engine. Because JSML is not supported, I have to do a setProperties call manually before each Say call. Unfortunately, this creates a race condition since properties can be set as things are being said. I have gotten around that via partial blocking of calls and made it so that pitch, pitch range, rate, and volume usually work (reverts to normal, no properties set when it doesn't work properly). This partial solution makes the properties work in most (but not all) cases; I have opted for partial blocking as it is stabler with regards to providing constant responsiveness to the end user. In an ideal universe, JSML would be supported very soon and I could simply use that and not have to do this extra work, but Java FreeTTS seems to be in hiatus, so we may have to live with this workaround for a while.