Skip to content

XLAI workspace architecture

This page summarizes crate boundaries and data flow. The stable public contract for domain types and traits is xlai-core; other crates adapt or orchestrate around it.

TIP

The same content lives in the repository as ARCHITECTURE.md for browsing on GitHub without the docs site.

Crate roles

CrateResponsibility
xlai-coreShared types (ChatRequest, ToolCall, …), XlaiError, and async traits (ChatModel, TtsModel, …).
xlai-runtimeRuntimeBuilder / XlaiRuntime, Chat and Agent sessions, tool execution, streaming, embedded prompts/skills. Includes xlai_runtime::local_common (prompt templates, tool JSON envelope, PreparedLocalChatRequest) for xlai-backend-llama-cpp / xlai-backend-transformersjs.
xlai-backend-openaiOpenAI-compatible HTTP client (chat, transcription, TTS).
xlai-backend-openrouterOpenRouter Responses API chat backend.
xlai-backend-geminiGoogle Gemini HTTP backend.
xlai-backend-llama-cppLocal GGUF inference via llama.cpp; maps xlai-core requests into local prompt/tool formats.
xlai-backend-transformersjsBrowser-side chat via a JS adapter (WASM only).
xlai-qts-manifestBrowser QTS manifest / capability serde types (no GGML/ORT). Re-exported from xlai_qts_core::browser; xlai-wasm (feature qts) depends on it directly.
xlai-qts-coreQwen3 TTS engine (GGUF + GGML talker, ONNX vocoder, optional ONNX reference-codec encoder for ICL ref_code, tokenizer, streaming) and native TtsModel bridge (QtsTtsModel).
scripts/qts (root pyproject.toml)Python export: uv run export-model-artifacts, uv run xlai-qts-hf-release — see Export and Hugging Face.
xlai-build-nativeInternal: shared CMake / OpenBLAS / Vulkan helpers for native build.rs scripts.
xlai-sys-llamaVendored llama.cpp build (CMake + bindgen) for the llama backend (vendor/native/llama.cpp).
xlai-sys-ggmlVendored standalone ggml build (CMake + bindgen) for QTS (vendor/native/ggml).
xlai-qts-clisynthesize / profile / tui binary (xlai-qts) for local TTS workflows.
xlai-facadeInternal (not on crates.io): native-only integration layer for xlai-native (optional llama / qts, Gemini + HTTP/local backends). xlai-wasm does not use this crate.
xlai-nativeNative Rust entrypoint: explicit re-exports and gemini submodule; uses xlai-facade for feature wiring. Enable optional qts for QtsTtsModel.
xlai-wasmwasm-bindgen entry points and JS-facing session factories. Re-exports from xlai-core / xlai-runtime / backends (no xlai-facade). Default feature qts enables local QTS WASM surface (stub TtsModel, shared browser manifest types, qtsBrowserTts*; see Browser / WASM runtime).
xlai-ffiC ABI facade for future native interop.

Request flow (chat)

  1. Application uses xlai-native or xlai-wasm to build a RuntimeBuilder with a ChatBackend.
  2. Chat / Agent in xlai-runtime maintains history and tool registry, then calls ChatModel::generate or streaming APIs on the configured backend.
  3. Backends translate xlai_core::ChatRequest into provider-specific payloads and map responses (and errors) back to xlai_core types.

Runtime execution hints (game / interactive workloads)

  • xlai-core defines advisory ChatExecutionOverrides / ChatExecutionConfig, TtsExecutionOverrides / TtsExecutionConfig, and a shared CancellationSignal. ChatRequest / TtsRequest carry optional execution plus in-process-only cancellation (not serialized on the wire).
  • Merge order for chat: RuntimeBuilder::with_chat_execution_defaultsChat::with_chat_execution_overrides / Agent::with_chat_execution_overrides → optional per-call layer on begin_stream / stream_with_options. Session Some(_) fields override runtime defaults per field.
  • xlai-runtime merges into every Chat::build_request, exposes ChatExecutionHandle (next_event, cancel_on_drop) alongside existing streams, and checks cancellation in XlaiRuntime::stream_chat. with_default_max_tool_round_trips seeds new Agent sessions unless overridden with with_max_tool_round_trips.
  • Local backends: PreparedLocalChatRequest carries execution + cancellation for llama.cpp / transformers.js prep. llama.cpp honors cooperative cancel in the token loop, optional max_tokens_per_second pacing, and ChatModel::warmup() preloads weights. QTS checks cancellation on unary/stream paths and in streaming PCM chunks; TtsModel::warmup() ensures the worker is spawned.
  • WASM session options accept camelCase chatExecution, runtimeChatExecutionDefaults, runtimeTtsExecutionDefaults, defaultMaxToolRoundTrips, and ttsExecution on TTS calls (see crates/platform/xlai-wasm/src/types.rs). Remote OpenAI-style backends ignore unsupported hints.

Errors

xlai_core::XlaiError carries a kind, human-readable message, and optional structured fields (http_status, request_id, provider_code, retryable, details) for observability. Backends should attach HTTP context when available (see xlai-backend-openai::provider_response).

Tests

  • Unit tests live next to code (src/tests/ submodules where files grew large).
  • Ignored e2e tests use real credentials or local model paths; see CI and testing and .github/workflows/e2e.yml on GitHub.

Released under the Apache License 2.0.