AnatomyStrong architectural inferencev1.21.5
In plain English
This page explains where an AI behavior can live. It may be in a model, but it may also be in a prompt, memory record, adapter, dataset, tool setting, evaluator rule, or human workflow.
- Why this matters: AI risk can come from the whole arrangement, not one obvious model.
- What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
- Technical version below: the expert terminology remains available and is linked through the glossary.
Zero-Dependency Browser Runtime Carriers
Evidence levelStrong architectural inferenceTechnical label: Architectural inference
A zero-dependency browser LLM reduces third-party code paths, but it does not reduce the number of state carriers to one. The carrier list shifts from cloud services and Python packages to local binary artifacts, buffers, caches, adapters, workers, and browser storage.
Carrier map
| Carrier | How behavior can persist | Review control |
|---|---|---|
.wasm binary | The exact kernel implementation determines dequantization, sampling, cache handling, and diagnostics. | Hash the binary; pin compiler profile; record panic=abort, LTO, and optimization mode. |
| Model container | Quantized blocks, tokenizer data, architecture metadata, and embedded manifests define the executable model. | Verify model hash, tokenizer hash, format version, and quantization profile. |
| AdapterA small add-on that changes or specializes model behavior. Open glossary definition payload | Low-rank and sparse deltas can modify behavior cheaply. | Sign adapter identity; record target tensors, rank, density, load order, and compatible base hash. |
| KV cache pages | Prefixes and generated states can affect continuation behavior. | Clear on reset; version cache layout; separate user sessions; audit copy-on-write sharing. |
| Speculative branches | Draft tokens may be generated and discarded before final output. | Do not write rejected branches to memory, analytics, training data, or prompt examples. |
| Local storage | IndexedDB, Cache Storage, files, and service-worker caches can preserve old components. | Provide a reset ecology action; enumerate and clear all local stores. |
| Worker threads | Shared memory and thread-local offsets can change execution and leak state across tasks if mishandled. | Scope workers per session, pin thread count, and validate SharedArrayBuffer boundaries. |
| Diagnostics | Checksums and counters can become the only later proof of what ran. | Store deterministic eval reports with UTC timestamps and artifact hashes. |
Boundary rule
A browser model is not just local weights. It is a local transition graphThe map of how an AI system is allowed to change over time. Open glossary definition. Review must cover the graph that can load, adapt, cache, speculate, decode, write, and reset.