QTS browser / WASM runtime
This document records feasibility findings and the intended runtime shape for local Qwen3 TTS in the browser (wasm32-unknown-unknown).
Feasibility spikes (repo state)
GGML talker (xlai-sys-ggml + vendored GGML)
cargo check -p xlai-sys-ggml --target wasm32-unknown-unknownstill fails during CMake configuration: the default C toolchain cannot compile a trivial program forwasm32-unknown-unknown(no compatible clang target / CMake "unknown" platform).- The current integration links static GGML libraries produced by CMake for host targets (see
crates/sys/xlai-sys-ggml/build.rs). That path assumes native linkers, pthread/OpenMP-style deps on Linux, frameworks on Apple, etc. - The
webgpuCargo feature onxlai-sys-ggmlforwards toGGML_WEBGPUin CMake. That is a native GGML backend build flag today, not a guarantee of a working browser WebGPU path from Rustwasm32-unknown-unknown.
Conclusion: End-to-end local GGML talker in the browser requires a separate build and integration story (for example Emscripten-based GGML, a prebuilt browser GGML WASM module, or a different inference stack). The existing xlai-sys-ggml CMake pipeline is not sufficient for wasm32-unknown-unknown without additional toolchain work.
Vocoder (ort / ONNX Runtime)
cargo check -p ort --target wasm32-unknown-unknownfails:ort-sysdoes not ship prebuilt ONNX Runtime binaries forwasm32-unknown-unknownwith the default feature set.- Browser deployments typically use onnxruntime-web (WASM + optional WebGPU EP) or a custom ORT build linked explicitly.
Conclusion: The current xlai-qts-core vocoder path (native ort sessions from filesystem paths) does not port directly. A browser vocoder needs either a custom ORT WASM build wired into ort, or a dedicated browser execution adapter.
Design direction
- Capability-based degradation: Expose structured capability objects from WASM (see
qtsBrowserTtsCapabilities) so UIs can branch on CPU vs WebGPU without relying on native env vars. - Asset manifest: Use a versioned manifest (see Browser model manifest) for fetch, cache, and integrity checks instead of a POSIX
model_diron disk. - Threading: Avoid
std::threadworker loops in browser builds; prefer single-threaded async or explicit Web Workers with message passing once an engine exists. - Next implementation steps (when engines exist): Plug a real
Qwen3TtsEngine(or split talker/vocoder adapters) behind the sameTtsModel/ WASM entrypoints added for the stub phase.
Browser matrix (all major browsers)
| Browser | Talker (GGML) | Vocoder (ORT) | Notes |
|---|---|---|---|
| Chromium | Pending | Pending | WebGPU most capable when engines land |
| Firefox | Pending | Pending | WebGPU / EP support may lag Chromium |
| Safari | Pending | Pending | Test WebGPU + WASM stack explicitly |
Until engines are integrated, the shipped WASM API returns a stable unsupported error with details.code = qts_wasm_engine_pending so tests and UIs can detect the phase.