XLAI workspace architecture
This page summarizes crate boundaries and data flow. The stable public contract for domain types and traits is xlai-core; other crates adapt or orchestrate around it.
TIP
The same content lives in the repository as ARCHITECTURE.md for browsing on GitHub without the docs site.
Crate roles
| Crate | Responsibility |
|---|---|
xlai-core | Shared types (ChatRequest, ToolCall, …), XlaiError, and async traits (ChatModel, TtsModel, …). |
xlai-runtime | RuntimeBuilder / XlaiRuntime, Chat and Agent sessions, tool execution, streaming, embedded prompts/skills. Includes xlai_runtime::local_common (prompt templates, tool JSON envelope, PreparedLocalChatRequest) for xlai-backend-llama-cpp / xlai-backend-transformersjs. |
xlai-backend-openai | OpenAI-compatible HTTP client (chat, transcription, TTS). |
xlai-backend-openrouter | OpenRouter Responses API chat backend. |
xlai-backend-gemini | Google Gemini HTTP backend. |
xlai-backend-llama-cpp | Local GGUF inference via llama.cpp; maps xlai-core requests into local prompt/tool formats. |
xlai-backend-transformersjs | Browser-side chat via a JS adapter (WASM only). |
xlai-qts-manifest | Browser QTS manifest / capability serde types (no GGML/ORT). Re-exported from xlai_qts_core::browser; xlai-wasm (feature qts) depends on it directly. |
xlai-qts-core | Qwen3 TTS engine (GGUF + GGML talker, ONNX vocoder, optional ONNX reference-codec encoder for ICL ref_code, tokenizer, streaming) and native TtsModel bridge (QtsTtsModel). |
scripts/qts (root pyproject.toml) | Python export: uv run export-model-artifacts, uv run xlai-qts-hf-release — see Export and Hugging Face. |
xlai-build-native | Internal: shared CMake / OpenBLAS / Vulkan helpers for native build.rs scripts. |
xlai-sys-llama | Vendored llama.cpp build (CMake + bindgen) for the llama backend (vendor/native/llama.cpp). |
xlai-sys-ggml | Vendored standalone ggml build (CMake + bindgen) for QTS (vendor/native/ggml). |
xlai-qts-cli | synthesize / profile / tui binary (xlai-qts) for local TTS workflows. |
xlai-facade | Internal (not on crates.io): native-only integration layer for xlai-native (optional llama / qts, Gemini + HTTP/local backends). xlai-wasm does not use this crate. |
xlai-native | Native Rust entrypoint: explicit re-exports and gemini submodule; uses xlai-facade for feature wiring. Enable optional qts for QtsTtsModel. |
xlai-wasm | wasm-bindgen entry points and JS-facing session factories. Re-exports from xlai-core / xlai-runtime / backends (no xlai-facade). Default feature qts enables local QTS WASM surface (stub TtsModel, shared browser manifest types, qtsBrowserTts*; see Browser / WASM runtime). |
xlai-ffi | C ABI facade for future native interop. |
Request flow (chat)
- Application uses
xlai-nativeorxlai-wasmto build aRuntimeBuilderwith aChatBackend. Chat/Agentinxlai-runtimemaintains history and tool registry, then callsChatModel::generateor streaming APIs on the configured backend.- Backends translate
xlai_core::ChatRequestinto provider-specific payloads and map responses (and errors) back toxlai_coretypes.
Runtime execution hints (game / interactive workloads)
xlai-coredefines advisoryChatExecutionOverrides/ChatExecutionConfig,TtsExecutionOverrides/TtsExecutionConfig, and a sharedCancellationSignal.ChatRequest/TtsRequestcarry optionalexecutionplus in-process-onlycancellation(not serialized on the wire).- Merge order for chat:
RuntimeBuilder::with_chat_execution_defaults→Chat::with_chat_execution_overrides/Agent::with_chat_execution_overrides→ optional per-call layer onbegin_stream/stream_with_options. SessionSome(_)fields override runtime defaults per field. xlai-runtimemerges into everyChat::build_request, exposesChatExecutionHandle(next_event,cancel_on_drop) alongside existing streams, and checkscancellationinXlaiRuntime::stream_chat.with_default_max_tool_round_tripsseeds newAgentsessions unless overridden withwith_max_tool_round_trips.- Local backends:
PreparedLocalChatRequestcarriesexecution+cancellationfor llama.cpp / transformers.js prep. llama.cpp honors cooperative cancel in the token loop, optionalmax_tokens_per_secondpacing, andChatModel::warmup()preloads weights. QTS checks cancellation on unary/stream paths and in streaming PCM chunks;TtsModel::warmup()ensures the worker is spawned. - WASM session options accept camelCase
chatExecution,runtimeChatExecutionDefaults,runtimeTtsExecutionDefaults,defaultMaxToolRoundTrips, andttsExecutionon TTS calls (seecrates/platform/xlai-wasm/src/types.rs). Remote OpenAI-style backends ignore unsupported hints.
Errors
xlai_core::XlaiError carries a kind, human-readable message, and optional structured fields (http_status, request_id, provider_code, retryable, details) for observability. Backends should attach HTTP context when available (see xlai-backend-openai::provider_response).
Tests
- Unit tests live next to code (
src/tests/submodules where files grew large). - Ignored e2e tests use real credentials or local model paths; see CI and testing and
.github/workflows/e2e.ymlon GitHub.
Related
- Crate taxonomy — published vs internal crates
- QTS overview — TTS stack and browser direction