AnatomyStrong architectural inferencev1.21.5

In plain English

This page explains where an AI behavior can live. It may be in a model, but it may also be in a prompt, memory record, adapter, dataset, tool setting, evaluator rule, or human workflow.

  • Why this matters: AI risk can come from the whole arrangement, not one obvious model.
  • What to look for: data, memory, routes, adapters, tools, evaluators, updates, and rollback paths.
  • Technical version below: the expert terminology remains available and is linked through the glossary.

Zero-Dependency Browser Runtime Carriers

Evidence levelStrong architectural inferenceTechnical label: Architectural inference

A zero-dependency browser LLM reduces third-party code paths, but it does not reduce the number of state carriers to one. The carrier list shifts from cloud services and Python packages to local binary artifacts, buffers, caches, adapters, workers, and browser storage.

Carrier map

CarrierHow behavior can persistReview control
.wasm binaryThe exact kernel implementation determines dequantization, sampling, cache handling, and diagnostics.Hash the binary; pin compiler profile; record panic=abort, LTO, and optimization mode.
Model containerQuantized blocks, tokenizer data, architecture metadata, and embedded manifests define the executable model.Verify model hash, tokenizer hash, format version, and quantization profile.
A small add-on that changes or specializes model behavior. Open glossary definition payloadLow-rank and sparse deltas can modify behavior cheaply.Sign adapter identity; record target tensors, rank, density, load order, and compatible base hash.
KV cache pagesPrefixes and generated states can affect continuation behavior.Clear on reset; version cache layout; separate user sessions; audit copy-on-write sharing.
Speculative branchesDraft tokens may be generated and discarded before final output.Do not write rejected branches to memory, analytics, training data, or prompt examples.
Local storageIndexedDB, Cache Storage, files, and service-worker caches can preserve old components.Provide a reset ecology action; enumerate and clear all local stores.
Worker threadsShared memory and thread-local offsets can change execution and leak state across tasks if mishandled.Scope workers per session, pin thread count, and validate SharedArrayBuffer boundaries.
DiagnosticsChecksums and counters can become the only later proof of what ran.Store deterministic eval reports with UTC timestamps and artifact hashes.

Boundary rule

A browser model is not just local weights. It is a local The map of how an AI system is allowed to change over time. Open glossary definition. Review must cover the graph that can load, adapt, cache, speculate, decode, write, and reset.