2025-12-12

adapterOSAIInfrastructureResearch

Building adapterOS: Trying to Measure What Everyone Said Couldn't Be Known

adapterOS did not begin as a company. It did not begin as a product, a governance framework, or an attempt to compete with foundation model providers.

It began as a question.

What if the internal behavior of AI systems could be inspected? What if learning could occur incrementally, without full retraining? What if we could understand what these systems were actually doing, even when prevailing wisdom claimed such understanding was either impossible or meaningless?

That question led to years of work. This article describes that work and the principles encoded into it.

The Original Question

The dominant narrative around large language models is that they are black boxes by nature. That their complexity exceeds human comprehension. That interpretability is a research curiosity, not a practical requirement.

I was not satisfied with this framing.

Not because I believed models were simple, but because I believed that measurement was still possible. That even if we could not fully explain every parameter, we could create systems where behavior was repeatable, comparable, and verifiable.

The question was not: "Can we understand everything a model does?"

The question was: "Can we build systems where learning can be observed over time?"

Why Engineering Drawings Mattered

The early intuition came from a specific domain: engineering drawings.

Engineering drawings are structured data with clear ground truth. Errors are subtle. Consequences are real. A missing dimension on a CAD schematic is not a stylistic choice. It is a defect that propagates downstream.

This domain mattered because it imposed constraints that most AI applications do not:

Precision over novelty. The goal is not to generate creative content, but to produce accurate representations that match physical reality.
Incremental learning over retraining. Engineers accumulate domain knowledge over years. A useful AI assistant should do the same.
Verifiable correctness. There is a ground truth against which outputs can be compared.

Working with engineering drawings taught me that AI systems needed to support adaptation without catastrophic forgetting, and that adaptation needed to be measurable.

Why Determinism Came First

If behavior cannot be repeated, it cannot be studied.

This is the foundational insight that shaped adapterOS. Not determinism as ideology, but determinism as prerequisite for measurement.

Modern AI systems are often non-deterministic by default. Temperature parameters introduce randomness. Batching affects computation order. Infrastructure variance affects execution timing. The same prompt can produce different outputs on different runs.

This makes learning impossible to isolate. If an adaptation produces different behavior, is it the adaptation? Or is it noise?

So the work began with determinism at the system level:

The same input should produce the same output.
The execution path should be reproducible.
Adaptations should be isolatable and comparable.

This was not about control. It was about creating conditions where learning could be observed.

Low-Rank Adaptation as a Serving Primitive

The next insight came from thinking about how models learn.

Full retraining is expensive, slow, and irreversible. Once a model is retrained, the previous version is gone. Comparison becomes impossible.

Low-rank adaptation offered a different approach. Instead of modifying the entire model, small adapter matrices could be trained to specialize behavior for specific tasks or domains. These adapters are:

Composable. Multiple adapters can be combined or swapped.
Isolatable. Each adapter represents a specific learning event.
Cheap to produce. Training an adapter requires a fraction of the compute of full retraining.

This led to the idea of serving units: low-rank adapted model instances that could be routed to based on task requirements. Instead of one monolithic model, a constellation of specialized adapters, each measurable, each comparable.

UMA and Locality

The next problem was bandwidth.

Moving large models around is expensive. Moving tokens is cheap. Yet the dominant architecture of AI inference is centralized: send your data to the cloud, receive your result.

UMA, Unified Memory Architecture, suggested a different approach. What if computation happened locally, in the user's memory space? What if models were cached and adapters were streamed, rather than the reverse?

This reframing had several benefits:

Reduced latency. Inference happens where the data lives.
Reduced energy cost. No round-trip to a data center.
Increased privacy. Data never leaves the local environment.

The vision was not anti-cloud. It was pro-locality. Move tokens, not models. Reduce the gravitational pull of centralized infrastructure.

Cryptographic Token Accounting

Once computation became local and deterministic, a new question emerged: how do we know what actually happened?

Trust requires evidence. If a system claims to have executed a certain computation, how do we verify that claim?

Cryptographic token accounting was the answer. Every execution produces a record:

What tokens were processed.
What adapters were active.
What resources were consumed.
When the computation occurred.

These records are hashed and signed. They can be compared across runs. They can be audited by third parties. They transform inference from an opaque service into a verifiable transaction.

This is not about cryptocurrency or speculation. It is about accountability. Understanding what computation actually occurred, where, and how often.

Deterministic Execution Receipts

The culmination of this work is the execution receipt.

A receipt is not a log. Logs are internal, verbose, and designed for debugging. A receipt is external, structured, and designed for verification.

Each receipt contains:

The input that was processed.
The configuration that was active.
The execution path that was taken.
The output that was produced.
A cryptographic signature tying these together.

Receipts are reusable. Given the same input and configuration, the same receipt should be reproducible. This enables:

Comparable runs. Two executions can be compared to detect drift.
Auditable learning. Adaptations can be traced through their effect on receipts.
Evidence over time. A history of receipts documents the evolution of system behavior.

I-Corps and the Market Reality

In 2024, I joined the NSF I-Corps program to test whether this work was ready for commercialization.

The answer was clear: it was not.

Not because the technology was flawed, but because the market was not ready to buy it. Either the need was not yet widely acknowledged, or organizations were not prepared to admit they lacked insight into their own AI systems.

The conversations were instructive. Enterprises were interested in AI, but their questions were about deployment, not inspection. About capability, not accountability. The discourse had not yet caught up to the risks.

This was a pivotal realization. The work was early. The insight was uncomfortable. Selling auditable AI infrastructure to organizations that did not yet believe they needed it was not a viable business model.

Reframing as a Research Group

I-Corps forced a reframing.

adapterOS was not a startup in the traditional sense. It was not building a product for an existing market. It was building infrastructure ahead of the curve, for a market that did not yet exist.

This is not a failure. It is a different kind of work.

The Kansas ACCEL grant validated this framing. The funding supported exploration rather than premature commercialization. It allowed the work to continue as research, with the expectation that the market would eventually arrive.

The current status is honest: adapterOS is a research group building instruments for asking better questions about AI systems. It is not a platform. It is not a product. It is an attempt to create the conditions under which AI can be understood.

The Long View

The future of AI efficiency is not larger data centers.

It is smaller, composable, low-rank adapted units. Routed deterministically. Sharing computation. Producing verifiable receipts.

The goal is not to win benchmarks. The goal is to reduce the physical and energetic footprint of AI. To make computation legible. To enable oversight without centralization.

This is an unfinished instrument. The questions it asks are not yet answered. The infrastructure it builds is not yet deployed at scale.

But the thesis remains:

That measurement is possible even when understanding is incomplete. That learning can be observed over time. That AI systems can produce evidence of what they do and why.

adapterOS started as a way to ask questions the industry said were unanswerable.

That is still what it is.