In plain English
This page preserves research summaries and source notes. Summaries distinguish direct findings from Cognivirus.com interpretation.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Zero-Dependency Browser LLM Architecture
The uploaded zero-dependency Rust report treats an in-browser LLM as a full local ecology, not as a single .wasm file. The report emphasizes WebAssembly SIMD, hierarchical quantization, adapter payloads, deterministic tokenization and sampling, paged/radix KV caches, speculative decoding rollbackReturning a system to an earlier known state. Open glossary definition, native image/audio preprocessing, custom allocation, worker/thread boundaries, zero-copy memory bridges, and direct diagnostics.
Direct answer
A browser-side LLM can improve privacy and latency, but it also moves the trust boundary onto the client. The deployable unit becomes a composition of model weights, adapters, tokenizer tables, sampler settings, KV-cache state, local storage, worker memory, multimodal decoders, and telemetry sidecars.
This page does not claim that a browser LLM is inherently unsafe. It maps which local runtime surfaces must be documented so a local model ecologyA changing AI system made from many connected parts, not just one model. Open glossary definition can be reproduced, evaluated, reset, and rolled back.
Architecture-to-control map
| Runtime surface | Report-derived concern | Control consequence |
|---|---|---|
| WebAssembly SIMD matvec kernels | Scalar and SIMD paths can differ in performance and edge behavior. | Test the exact compiled runtime, not only the reference math path. |
| K-Quant / block quantization | The deployed artifact is the quantized artifact, not the FP16 source checkpoint. | Record quant format, decoder version, block size, and dequantization test cases. |
| AdapterA small add-on that changes or specializes model behavior. Open glossary definition payloads | Hot-swappable low-rank or sparse deltas can alter behavior without replacing the base model. | Bind adapters to base-model identity, tensor layout checksum, load order, and signed provenanceA record of where a component or behavior came from. Open glossary definition. |
| Tokenizer and sampler | Deterministic reproduction requires tokenizer tables, temperature, Top-K, Top-P, and seed. | Store sampler config and tokenizer identity in the composition manifestA machine-readable record of the exact runtime composition used for an evaluation, release, incident, or rollback. Open glossary definition. |
| PagedAttention / RadixAttention KV cache | Prefix sharing and copy-on-write can preserve context across branches. | Track cache ownership, reference counts, eviction, reset ecology, and rollback boundaries. |
| Speculative decoding | Rejected draft branches require clean rollback. | Log target/draft model identities, acceptance rate, and rollback actions. |
| Native QOI / FFT preprocessing | Multimodal input parsers become part of the model behavior boundary. | Version decoders and constrain accepted formats. |
| Custom bump allocator | Arena reset is a control boundary. | Record arena high-water marks and reset points; fail closed on unexpected memory growth. |
| Shared workers and zero-copy bridges | SharedArrayBuffer and direct memory pointers can bypass ordinary serialization boundaries. | Treat worker identity, atomic counters, and buffer ownership as security-relevant runtime metadata. |
| Diagnostics sidecar | Without direct telemetry, local inferenceA conclusion or output produced from data. Open glossary definition becomes hard to reproduce. | Emit fixed-format diagnostics with token counts, latency, memory use, sampler config, and checksums. |
Source leads from the uploaded report
These links are carried forward as report-derived source leads. They are useful starting points for reviewers; they are not presented here as independently re-verified endorsements.
- Rust <code>core::arch::wasm32</code>
- V8 WebAssembly SIMD overview
- Hugging Face GGUF documentation
- vLLM automatic prefix caching design
- Rust and WebAssembly code-size guide
How Cognivirus uses this report
The report strengthens existing pages about local model ecologies by making the local runtime itself visible. The practical lesson is simple: if a system can run locally, it still needs lineageThe parent-child history of models, adapters, datasets, or releases. Open glossary definition, composition manifests, reset boundaries, reset ecology, and rollback packets.