Should we keep the older WinMM-based SAPI4 implementation as an option? #17792

gexgd0419 · 2025-03-07T03:56:10Z

gexgd0419
Mar 7, 2025

As it turned out, writing a WASAPI implementation for SAPI4 that works with everything isn't as easy as I would think.

Currently there are some reports that the WASAPI implementation doesn't work with certain SAPI4 voices:

Fix SAPI4 WASAPI implementation #17762 (comment), WinTalker voice version 1.6
SAPI 4: TruVoice still doesn't quite work #17775, TruVoice
SAPI 4: Some strings are silent when L&H TTS3000 is used #17776, L&H TTS 3000

The problem is that the SAPI4 engine interacts with the audio object directly, and each engine may use the object in its own way, so each SAPI4 voice may have its own unique problem when working with WASAPI.

As the SAPI4's built-in, WinMM-based MMAudioDest might be the most commonly used implementation, most SAPI4 engines were designed to work with MMAudioDest, and might rely on some of its undocumented behavior. As my WASAPI implementation is designed to replace MMAudioDest, it should replicate the behavior of MMAudioDest as well.

There is documentation about the IAudio and IAudioDest interfaces, which is what I based my implementation on. However, the documentation is missing some details. As a result, my implementation's behavior is different with the built-in MMAudioDest in some aspects, which breaks some voices.

Unfortunately, the speech.dll that MMAudioDest resides in is not open-sourced, and I can't find its PDB file to get its debug symbols, so it's basically a black box. Studying its behavior is, therefore, not easy.

So now I'm considering bringing the old, WinMM-based, proven-to-work implementation back, at least temporarily.

Not only can it allow users to opt-out of the WASAPI implementation if it's not working for them, but it can also allow me to log the behavior of the built-in implementation, so that I can ask users to try both implementations, and compare their behavior in the logs.

What do you think of this? cc. @SaschaCowley @seanbudd

Also here are some questions, if we decide to go this way.

Should we let users opt-in or opt-out of the new WASAPI implementation?
Should the switch be put in Speech settings or Advanced settings?
Should we warn users that the WinMM implementation may be removed in the future?

SaschaCowley · 2025-03-07T04:36:36Z

SaschaCowley
Mar 7, 2025
Maintainer

We think this is a good idea while we iron out the kinks with the WASAPI implementation. To address your specific questions:

Should we let users opt-in or opt-out of the new WASAPI implementation?

The WASAPI output should be enabled by default so that it gets maximal testing on alpha/beta. If there are still major known problems when it comes to release time, we can always disable it for the release while we work on them.

Should the switch be put in Speech settings or Advanced settings?

The switch should be in advanced settings. If it was in speech, it would have to be a property of the SAPI4 synth driver. However, if users were unable to use SAPI4 due to the WASAPI output, they would not be able to change the setting, as they would need SAPI4 enabled to change it. Since most users will not understand or need this setting, we believe it should go in advanced. It should probably also be a feature flag.

Should we warn users that the WinMM implementation may be removed in the future?

At this stage, we don't think this is necessary.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we keep the older WinMM-based SAPI4 implementation as an option? #17792

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Should we keep the older WinMM-based SAPI4 implementation as an option? #17792

gexgd0419 Mar 7, 2025

Replies: 1 comment

SaschaCowley Mar 7, 2025 Maintainer

gexgd0419
Mar 7, 2025

SaschaCowley
Mar 7, 2025
Maintainer