The Deployability Gap in Defense AI Architecture
- Apr 28
- 42 min read
By Andrew Park | 2026-04-28
The Department’s AI investment is optimized for connected environments. The fight that matters most isn’t.
The Department’s move to establish the Maven Smart System as a formal program of record is proof that AI-enabled operations have entered the acquisition mainstream.[24] What’s missing is an equal commitment to military-specific Small Language Models (SLMs) for the use cases where frontier LLMs dependent on persistent connectivity to remote compute aren’t just overkill but operationally unviable. Those deployments require persistent connectivity to remote compute and data center-scale infrastructure that don’t exist at the tactical edge. The environment those systems will have to operate in looks nothing like the one where MAVEN has been validated.
I’m a believer in LLMs. We’ve been incorporating both LLMs and SLMs into our products since 2024, and the pattern holds consistently: a well-designed SLM trained on the right data for a specific use case can match or exceed larger-model output quality on that function, at a fraction of the latency and compute cost. The right model for the job is the one calibrated to the task, the environment, and the constraints.
The Department’s AI-enabled operations in connected, permissive environments have produced real results. That success has created a structural tension in how the Department is investing. Secretary Hegseth’s January 2026 AI strategy rightly emphasizes operational AI delivery beyond centralized environments, including edge-relevant use cases. But the same strategy mandates parity with the latest commercial frontier models: deploying them within 30 days of public release as a primary procurement criterion.[1] Those two directives pull in opposite directions. The institutional pull toward frontier model parity is stronger than the pull toward edge deployment requirements, and acquisition timelines are reflecting that. In a recent piece, Oceans and Orbits,[2] I examined what a conflict with China in the Indo-Pacific actually demands. That environment looks nothing like the one where the current architecture is being validated. China has specifically invested to exploit that difference, targeting the data links our AI depends on. An architecture optimized for connected environments is the wrong foundation for a contested Indo-Pacific fight.
The Department of the Air Force’s April 2026 AI Strategy explicitly directs a hybrid cloud-to-edge model and prioritizes edge AI for Disconnected, Degraded, Intermittent, and Limited-bandwidth (DDIL) environments.[3] That strategic intent is correct. The gap is between intent and acquisition posture. The strategy also directs governance to shift from risk management to aggressive barrier removal. For logistics and maintenance, that’s the right posture. For AI that surfaces targeting options in a lethal kill chain, some barriers are load-bearing.

The scale alone redefines every requirement. That environment has 1 overriding constraint that shapes every technology decision made within it: SWaP (Size, Weight, and Power).
SMALL LANGUAGE MODELS DELIVER SUPERIOR SPEED
The kill chain is a race. The side that closes it faster than the adversary can disrupt it wins. Future wars will be decided by two things: the speed at which a force can sense, decide, and act, and its ability to deny that same speed to the enemy.[4]
The calculus at the edge isn't about mass. A formation won’t need more people to win; it will need faster decisions. Losing some data transport is manageable. But losing the compute that runs AI-assisted targeting is a mission failure. Speed is the irreplaceable variable.
Consider what that speed differential looks like in practice. An adversary with AI-assisted targeting running on purpose-built edge hardware could close its kill chain in milliseconds: sensor data ingested, threat characterized, response option generated, human decision prompted. A force whose AI depends on a cloud connection, or has impaired connectivity, measures that same cycle in seconds, minutes, or not at all. In a high-tempo engagement, that gap isn't a performance disadvantage. It's a targeting solution the adversary completes before ours has started.

China is designing systems to get inside our decision loop, to sever our data links, and blind our networks. An AI capability that depends on persistent cloud connectivity isn't a resilient kill chain asset. It's a single point of failure that a peer adversary has already mapped and targeted.[4]
SLMs keep working when the network goes down. They live on the hardware the formation carries. A frontier LLM dependent on persistent connectivity to remote compute isn't degraded when that link is severed. It's gone. A military-specific SLM on edge hardware is a kill chain asset.
SMALL LANGUAGE MODELS DELIVER SUPERIOR SIZE, WEIGHT, AND POWER
Size, Weight, and Power (SWaP) determines what can actually be deployed and sustained at the tactical edge. It ranks above features. It ranks above benchmark performance. A system that can't fit in the available space, can't be carried by the available personnel, or can't run on the available power doesn't get deployed, regardless of what it can do in a demonstration. SLM-based edge systems win on every SWaP dimension. A Micro/Embedded SLM fits in a cargo pocket and draws under 10 watts. A Tactical Edge SLM fits in a backpack and draws under 60 watts. A frontier LLM generally depends on data-center-scale infrastructure and persistent high-bandwidth access to remote compute. These aren't tradeoffs. They're different categories of system.
I have seen this play out across multiple product cycles. The capability that wins in the field is the most capable option that fits within the SWaP envelope, not the most powerful one. Everything else stays behind.
SLMs run on the hardware a decentralized team carries into a contested environment, without a cloud connection, on the power available to them. Frontier LLMs don't.
THE DEPLOYABILITY CLIFF
Figure 3 maps the full model tier spectrum against the constraints that determine what actually reaches the field.

Full technical specifications in Appendix B. Sources: [5][6][7][8]
The two deployable tiers are the only ones a decentralized team can carry into the field. Everything below that line requires infrastructure, logistics, and power a tactical unit can't sustain.
An adversary that can deny communications to a formation running cloud-dependent AI has disabled their AI capability entirely.[4] The adversary doesn't need to destroy the system. They only need to sever the link it depends on. Appendix A documents the three primary vectors China uses to do exactly that.
THE TRAINING DATA PROBLEM
Frontier LLMs are trained on the internet: consumer content, social media, entertainment, and every other category of publicly indexed data. For tactical defense use cases, that breadth is dead weight. The relevant knowledge base for a warfighter is narrow and specialized, and virtually none of it appears in any public training corpus. A model trained on curated, mission-relevant government data will therefore be smaller, faster, and more accurate within its domain than any general-purpose LLM asked to operate near the boundary of its relevant knowledge. Appendix C is a technical deep dive for readers who want to go further. Quantizing a frontier model to run on smaller hardware is not the answer. A compressed model is smaller but carries the same domain mismatch. The irrelevant knowledge is still baked into the weights. A military-specific SLM adapted around curated government doctrine and mission data is a different thing entirely.
Army Futures Command's Research and Analysis Center documented this precisely in October 2024. Their TRACLM project, successive generations of domain-fine-tuned models built on Army-specific data, improved on Army-specific tasks across successive iterations. They built a new benchmark, MilBench, because existing evaluations couldn't capture the performance gap that domain mismatch creates.[9] The Army has already proven the argument for military-specific models. The question is whether the procurement posture follows the evidence.
THE AGENTIC WORKFLOW PROBLEM
The defense AI community is moving, correctly, toward multi-agent workflows, where discrete AI systems each handle one step in a decision cycle and pass results to the next agent in the chain. The challenge is that every agent in that chain carries its own SWaP cost. Chain 4 or 5 frontier LLMs in sequence and they multiply power draw, processing load, and response time at each step. A chain of that size, operating without cloud connectivity on edge hardware, collapses under its own weight before it delivers the speed advantage it was designed to create.
The model architecture choice determines whether the chain works at all. Purpose-built SLMs, trained on narrow domains and designed for specific functions within a workflow, exhibit narrower and more controllable behavior than general-purpose LLMs asked to operate across the full complexity of a battle management problem. That's not just a SWaP argument. It is a reliability argument. Appendix D goes deeper on why frontier LLM-based agent chains are mismatched to the needs of the tactical edge for readers who want the full technical picture.
This argument has an internal critic worth acknowledging. Writing in Military Review in August 2025, an Army officer argued that smaller models lack the reasoning capacity for complex operational problems.[10]He's right for connected garrison and cyber operations use cases. His argument and this one address different tiers. Where connectivity is reliable and compute is available, frontier models are today's best option. Military-specific models at that scale don't yet exist. When they do, the calculus will shift. Where connectivity isn't reliable and compute is constrained, frontier models aren't viable at all. The architecture needs both tiers. That reliability gap isn't separable from the accountability question. Speeding up information flow isn't the same thing as delegating judgment. An agent chain that synthesizes intelligence, routes it, and surfaces a course of action faster than any human staff process can match is a powerful tool. It's not a decision-maker. AI can improve decision superiority. It can also introduce unacceptable risk. Which one dominates depends entirely on whether the human acting on its output can meaningfully evaluate what the system did and why. The humans who act on its outputs are still accountable for what happens next.
The Mission Function Problem
Not all AI functions carry the same consequence when they’re wrong. A model that misclassifies a maintenance priority generates a rework order. A model that misclassifies a target generates a deadly strike. Architecture decisions about how AI output reaches a human decision-maker should be calibrated to that distinction.
The systems being fielded today were validated in permissive, connected environments. They don’t prove performance against an adversary designed to exploit AI dependencies: one that generates synthetic signatures, feeds false data into targeting chains, or severs connectivity at the moment AI-assisted decisions are made. The validation environment and the warfighting environment aren’t the same, and programs that treat them as equivalent carry significant risk.
Military-specific SLMs trained on narrow, mission-relevant data produce more interpretable outputs than frontier models. The human acting on a purpose-built SLM has something they can evaluate. The human acting on a frontier model recommendation in a high-tempo engagement has to trust or override it with limited ability to verify the reasoning in the time available. For recoverable decisions, that’s an acceptable tradeoff. For irreversible ones, it’s the question acquisition programs need to answer before the architecture hardens.
THE RIGHT MODEL FOR THE RIGHT ENVIRONMENT
The right model for each environment isn't a compromise. It's the superior choice for the functions it's designed to serve.
Small language models are the right choice wherever response time and SWaP are the binding constraints. A military-specific, purpose-built SLM on Jetson-class edge hardware responds in under 500 milliseconds and draws under 60 watts. For threat classification, sensor data interpretation, targeting decision support, and tactical agent workflows, a purpose-built SLM is faster than any frontier LLM, and when trained on mission-relevant data, more accurate too. A model trained on a narrow, mission-relevant dataset doesn't carry irrelevant knowledge and doesn't waste compute retrieving it. The fly swatter isn't an inferior substitute for the sledgehammer when you are swatting flies. It's the superior tool. SLMs deliver better results for these functions than larger models would, even if power and connectivity were unlimited.
Medium language models, the 13B to 70B parameter range at the forward base tier, are the right choice where more analytical depth is needed: course of action generation, intelligence synthesis across multiple documents, and operational planning support. Response times of 1 to 5 seconds are acceptable for deliberate planning functions. Where sub-second response is required in the same environment, SLMs handle those functions. The two tiers coexist at the forward base level, each doing what it does best.
Large language models belong in connected, high-compute environments where SWaP isn't the constraint and analytical breadth is the requirement. Campaign planning, theater-scale logistics optimization, and training and simulation all demand the reasoning depth and cross-domain synthesis that only a frontier model provides. The synthetic data LLMs generate at garrison can also feed the training of smaller models operating in the field. The tiers aren't in competition. They're complementary layers of the same architecture. Appendix E lists the specific open-weight models available at each tier for teams ready to evaluate options.
The critical design constraint across all three tiers follows directly from Appendix A. In a conflict with China in the Indo-Pacific, the larger tiers can't be assumed available. Forward bases within the first and second island chains are within range of precision strike. Cloud connectivity is a primary denial target. Proliferated LEO constellations improve resilience but don’t eliminate the denial problem. A degraded link is better than none. An architecture that fails when the link is severed entirely is still the wrong architecture. The architecture must be engineered so that every function critical to the kill chain can be executed at the smallest viable tier. SLMs aren't a fallback when larger infrastructure is unavailable. They're the foundation the entire stack has to be built on, because they're the only tier whose availability can be counted on in the fight that actually matters.
WHY THE MARKET WON'T SOLVE THIS
Brose's central warning in The Kill Chain is that the U.S. risks losing a future war because it builds the wrong technology for the wrong environment.[4] The commercial market has no incentive to self-correct.
OpenAI, Anthropic, Google, Meta, and every other major AI developer are competing on the same variables: benchmark performance, context window size, multimodal capability, and inference speed on cloud infrastructure. Those are the metrics their customers pay for, and those customers are overwhelmingly enterprises, developers, and consumers operating in connected environments with reliable power.
None of that investment produces a model that fits in a backpack, runs on 30 watts, operates without a network connection, and is trained on government doctrine rather than the public internet. That model does not exist in any commercial lab's product roadmap, because no commercial customer is asking for it and no commercial business case justifies building it.
The investment pattern reflects it. The Department has documented the requirement at every level: in Army domain research, in CJADC2 operational experiments, in Secretary Hegseth’s own January 2026 AI strategy memo.[1] Project ARIA, launched in March 2026, explicitly targets tactical edge AI deployment in denied environments.[11] But ARIA is a rapid prototype partnership, not a funded program of record. The difference matters: a prototype demonstrates feasibility. A program of record commits the architecture.

The Department has faced this before, with satellite communications, with drone autonomy, with hardened electronics. The answer in every case was the same: the Department had to create the demand signal itself, fund the early development itself, and pull industry toward the capability it required rather than waiting for industry to arrive there on its own schedule.
WHAT THE DEPARTMENT MUST DO
Project ARIA demonstrates that the Department already knows what needs to be built.[11] The capability has been prototyped. The operational requirement has been articulated. What’s missing is a joint program of record that commits the architecture across services, with hard SWaP requirements as non-negotiable specifications rather than aspirational preferences, before the current generation of programs hardens around the wrong assumptions. That also means resolving the structural contradiction in the current AI strategy: the 30-day frontier model parity mandate and the edge deployment directive cannot both be the primary procurement criterion. One optimizes for connected demonstrations. The other is required for contested warfighting. Those two directives pull in opposite directions, and programs being designed today are inheriting that contradiction. Every program that encodes persistent cloud connectivity as a design assumption rather than a variable is building the wrong architecture for the wrong fight. That assumption works in permissive environments. It fails precisely when and where it matters most, in a contested theater where China has specifically invested to sever the links those programs depend on.
The big AI labs’ success is pointed somewhere else. Waiting for OpenAI or Google to build a military-specific, mission-purpose-built, air-gap-capable small language model isn’t a strategy. It’s a decision to go without. The path is clear: hard SWaP specifications, government training data available to qualifying developers, and a demand signal to the industrial base that this capability has a buyer. What’s missing is a joint program of record that makes the commitment durable across services and budget cycles.
That signal matters as much as the funding. Industry will only build what it believes the Department will actually procure. Significant investments in military-specific SLM developments, structured around hard SWaP requirements and mission function specificity, would signal a generation of smaller, faster-moving companies that there's a market for practical, fieldable AI. It would tell them that the path to a defense contract doesn't run through a hyperscale data center. The funding is the lever that reorients those incentives.
The proof of concept exists. At least four startups, EdgeRunner AI, Smack Technologies, TurbineOne, and JARVIS Defense, are already building domain-specific, air-gapped AI for the tactical edge, funded by venture capital. EdgeRunner, Smack, and TurbineOne have documented contracts or service engagements across multiple branches.[12][13][14][15][16] The Anthropic supply chain designation in March 2026, a designation Anthropic is contesting in court, accelerated interest in them dramatically. That acceleration is reactive, not strategic. Venture capital can seed a concept. It can't build a defense architecture. Only the Department can do that, and only if it decides to.
Questions Acquisition Leaders Should Ask
Every program integrating AI should be able to answer these questions before an architecture decision hardens.
Which AI functions must remain operational without reachback to cloud infrastructure?
Does the selected model meet kill chain latency requirements at the actual deployment tier?
What SWaP envelope applies at the actual point of employment, not the demonstration environment?
Is the model tier for each mission function documented as a hard requirement?
What mission-specific data trains, fine-tunes, or evaluates the model?
What capability remains when SATCOM, terrestrial fiber, and forward base compute are degraded or denied simultaneously?
Who is accountable for acting on the model’s output, and can they meaningfully evaluate what the system did and why?
For each AI function in the kill chain, what’s the consequence if it’s wrong, and has that assessment been independently reviewed by someone without a stake in the program’s success?
A program that can’t answer these questions has encoded assumptions rather than requirements. The architecture decisions being made today will be very difficult to unwind after contract award.
I have seen the same pattern play out across nearly three decades of delivering software under real SWaP constraints. We've built ML into government applications and AI into our own products, and the pattern holds in both: systems engineered around the actual operating environment outperform systems engineered around the best available general purpose commercial capability. The Department is making architecture decisions right now for systems that will have to operate across the Indo-Pacific, a vast theater of contested ocean and airspace, under conditions of degraded connectivity, intermittent bandwidth, and extended operational isolation.[2] Those decisions are hardening into constraints. A logistics system, a battle management node, or an intelligence workflow built on cloud-dependent AI encodes an assumption that the Indo-Pacific will not reliably support. The time to specify SLM-capable architectures is while programs are still being designed, not after they have been fielded and integrated. The organizations that create the demand signal now will shape what the industrial base builds. Programs that wait will inherit an architecture optimized for the demonstration environment rather than the warfighting one.
Speed matters enormously. A force that closes its kill chain faster than an adversary can disrupt it holds a decisive advantage. But speed is a means, not a governing principle. AI that is fast but untrustworthy, fast but unaccountable, or fast but wrong at consequential moments is a liability. If the Department keeps treating cloud-dependent frontier LLMs as the default answer, it will optimize for connected demonstrations instead of contested warfighting realities. The governing principle: AI should be a servant, not a master. Automate the mundane. Compress the mechanical. Extend the reach of expert humans. Preserve clear accountability for decisions whose consequences can't be walked back.
The author is a technology leader with nearly three decades of experience delivering mission-critical software to government customers, including tactical teams operating under real SWaP constraints. He works with defense acquisition organizations on software engineering discipline, long-term software sustainability, and transition risk. He has led ML and AI developments into production systems since 2016.
Appendix A: Connectivity Denial and the AI Dependency Problem
Why Connectivity Is the Decisive Vulnerability
Cloud-dependent AI systems require persistent, high-bandwidth connectivity to function. In a permissive environment, that connectivity is assumed. In a conflict with China in the Indo-Pacific, it is the primary target. China has built a multi-domain denial architecture specifically designed to sever or degrade the data links that US military systems depend on. This appendix documents the three main vectors of that denial capability: electronic warfare and jamming, anti-satellite and space-based communications attacks, and undersea cable sabotage. Together, they describe a threat environment in which AI systems that can't operate without cloud connectivity will be rendered ineffective at the moment they are needed most.
Electronic Warfare and Jamming
China has invested heavily in electronic warfare capabilities designed to blind adversary sensors, disrupt targeting, and degrade command-and-control. The People's Liberation Army (PLA) fields long-range ground-based jammers, airborne electronic attack platforms including the J-16D electronic warfare variant, maritime electronic warfare systems, and GPS spoofing capabilities. These systems are specifically designed to operate within China's Anti-Access/Area Denial (A2/AD) architecture, targeting the communications and data links that US forces depend on across the electromagnetic spectrum. The 2025 DoD China Military Power Report documents China's extensive electronic warfare and A2/AD capabilities in detail.[23] An AI-enabled battle management system that routes its inference through cloud infrastructure is directly vulnerable to this electromagnetic attack surface. When the data link is jammed or spoofed, the AI capability goes offline. The kill chain it was supposed to accelerate stops.
Anti-Satellite Capabilities and Space-Based Communications
Satellite communications are the backbone of U.S. military connectivity across the Indo-Pacific. China has developed a suite of counterspace capabilities designed to disrupt, degrade, or exploit U.S. and allied use of space, including systems that threaten satellite communications and space-enabled targeting. The U.S. Space Force’s 2025 Space Threat Fact Sheet documents China’s expanding counterspace architecture and its relevance to U.S. and allied operations.[25] The implication for AI-dependent systems is direct: satellite-relayed connectivity is not a safe fallback when the adversary has invested specifically in denying or exploiting the space layer.
Undersea Cable Sabotage
Undersea fiber-optic cables carry the overwhelming majority of international internet traffic, commonly cited at roughly 99 percent. In the Indo-Pacific, they are also critical military infrastructure. China has developed capabilities relevant to undersea infrastructure disruption, and Chinese-linked or Chinese-crewed vessels have been implicated, investigated, or prosecuted in multiple cable incidents near Taiwan. Public reporting does not prove every incident was deliberate state-directed sabotage, but the pattern is enough to make undersea cable dependency a serious operational risk. In early 2025, Chinese-linked vessels were implicated in multiple cable cuts near Taiwan. In November 2024, a Chinese bulk carrier severed two fiber-optic cables in the Baltic Sea connecting Sweden-Lithuania and Germany-Finland. CSIS has warned that China’s subsea cable-cutting capability creates a serious international security concern for critical digital infrastructure.[26] US officials have also expressed concern that Chinese cable repair ships operating in the Pacific may be conducting reconnaissance on US military communications links. Strategic chokepoints are particularly vulnerable: Guam, the cornerstone of US Indo-Pacific military operations, is served by over a dozen undersea cables that represent high-value targets in any conflict scenario.

The AI Dependency Implication
Traditional command-and-control systems were designed with degraded communications as a baseline assumption. Doctrine, training, and equipment all account for the possibility of severed links. Cloud-dependent AI systems have not been designed with that assumption. They require persistent connectivity to function at all. An AI-enabled kill chain that routes inference through a cloud data center is not a resilient capability. It is a single point of failure that an adversary with China's denial capabilities has already mapped, targeted, and demonstrated the willingness to attack in gray-zone operations today. A system that works in a connected environment and fails in a denied one has not been designed for the warfighting problem. It has been designed for the demonstration.
Long-Range Precision Strike and Forward Base Viability
The connectivity denial problem is compounded by a basing problem. The Department's AI architecture assumes access to forward infrastructure: hardened shelters, generator power, server racks, and logistics support at forward operating bases. In a conflict with China in the Indo-Pacific, that infrastructure is a target. China's People's Liberation Army Rocket Force (PLARF) has developed and deployed long-range precision strike systems specifically designed to hold US forward bases at risk across the first and second island chains.
The DF-21D medium-range anti-ship and land-attack ballistic missile, with a range exceeding 1,500 km, covers US bases in Okinawa, the Philippines, and throughout the first island chain. The DF-26 intermediate-range ballistic missile, with a range exceeding 4,000 km and referred to by defense analysts as the "Guam Killer," puts Andersen Air Force Base, Naval Base Guam, and Marine Corps Base Camp Blaz within range of precision conventional or nuclear strike. The DoD China Military Power Report documents the PLARF fields approximately 250 IRBM launchers, sufficient to hold US forward bases across the first and second island chains at risk.[23] These are not aspirational capabilities. They are deployed, road-mobile, and exercised regularly in scenarios that explicitly simulate strikes on US forward bases and carrier strike groups.
The implication for AI architecture is direct. A forward base tier AI system, the shelter-class hardware described in Appendix B requiring server racks, active cooling, and generator power, depends on the physical survivability of that base. In a conflict scenario, US bases throughout the first island chain and as far as Guam are within range of precision strike. The US Navy is already adjusting carrier strike group doctrine to keep carriers beyond 1,000 km from the Taiwan Strait specifically because of DF-26 range. Fixed computational infrastructure is even less mobile than a carrier. A Proceedings article from December 2024 drew an explicit parallel to the fall of US bases in the Philippines in 1941, arguing that the US must anticipate where existing bases are vulnerable and design for the capability to operate from expeditionary advanced bases further from the threat. The lesson applies directly to AI-enabled systems: any capability that requires fixed forward infrastructure to function has encoded a survivability assumption that China's strike capabilities are specifically designed to destroy.
The design conclusion is the same as the connectivity denial argument: the AI capability stack must be engineered to function at the smallest viable tier, because larger tiers can't be assumed available. A garrison-level AI system that requires a dedicated facility is a planning asset in peacetime and a liability in conflict. A tactical edge SLM that runs on hardware a warfighter carries is available in every operational scenario, regardless of what happens to the forward base network. The architecture has to be designed for the conditions of the fight, not the conditions of the demonstration environment where it was validated.
Appendix B: Full Technical Spectrum Tables
The table below provides the full hardware specifications behind the leadership summary in the article. Two terms in the Response Time column need brief explanation. TTFT (Time to First Token) is the delay between submitting a question and receiving the first word of the response, the pause before the model starts answering. Tokens per second measures how fast the model produces output after that; one token is roughly one word. A response time of 5 to 30 seconds in the table means the operator waits up to half a minute for an answer to begin. At tactical decision speeds, that is too slow to matter. All figures in the tables below are representative planning ranges, not certified platform specifications. Actual latency and throughput vary by quantization level, context length, batch size, prompt size, runtime, accelerator, and thermal limits.
Sources: NVIDIA DGX H100 official specifications[5]; NVIDIA Jetson AGX Orin benchmarks[6]; Arya and Simmhan, arXiv:2506.09554[7]; MLPerf Inference v5.0[8]. Parameter counts for closed frontier models are not publicly disclosed by their developers and should be treated as illustrative estimates only.
Appendix C: Model Size, Training Data, and Deployment Requirements
What Parameters Actually Are
A language model is, at its core, a very large collection of numerical values called parameters or weights. These values encode everything the model has learned: how language works, what facts it knows, how to reason through problems. The more parameters a model has, the more knowledge it can hold and the more complex the problems it can handle. But there is a direct physical cost: every parameter requires memory to store and computing power to process. A 7 billion parameter model (written 7B) requires roughly 14 gigabytes of working memory, similar in size to a large video game, except this must be loaded entirely into the processor's active memory, not just stored on a drive. A 70B model needs roughly 140 gigabytes. A 1 trillion parameter frontier model requires infrastructure that exists only in large data centers. As model size grows, the hardware required to run it grows with it, which is exactly why large models cannot be carried into the field.
Why LLMs Carry Irrelevant Training Data
Frontier LLMs are trained by feeding them enormous collections of text from the public internet: web pages, books, social media, forums, code repositories, and more. This training process is what gives these models their broad capability. They have, in effect, read a large portion of everything publicly written. But that breadth comes at a cost. A model trained on the entire internet carries the entire internet's worth of knowledge encoded in its parameters. For a defense analyst asking about rules of engagement, most of that encoded knowledge is irrelevant. The model cannot discard it. It is baked into the model's structure and must be carried, and paid for in size, memory, and power, with every query, whether or not it helps answer the question.
How SLMs Are Different
A small language model built for a specific defense use case is trained on a carefully selected, narrow dataset: relevant military doctrine, operational data, and domain-specific examples, not the public internet. The result is a model that is much smaller in parameter count, fits on hardware a warfighter can carry, responds faster, and produces more accurate answers within its intended domain. It does not attempt to answer questions outside that domain. That is not a limitation in a tactical context. It is a feature. A targeting support model that knows targeting doctrine precisely and nothing else is more valuable at the edge than a model that knows everything approximately. Precision in a small package beats breadth in a package that cannot be deployed.
Quantization and Edge Optimization
One additional technique that reduces model size is quantization, a compression process that stores the model's numerical values at lower precision. Think of it like reducing the resolution of a photograph: the image is smaller and loads faster, with only a modest loss in quality. Quantization can reduce a model's memory requirements by 50 to 75 percent. A 7B model that normally requires 14 gigabytes of memory can be compressed to approximately 4 gigabytes through quantization, making it runnable on hardware a warfighter can carry. This technique applies to SLMs and smaller open-weight models. It does not solve the fundamental SWaP problem for frontier LLMs. Frontier LLMs start at one trillion parameters or more. Even at 75 percent compression they require hundreds of gigabytes of memory. Even heavily compressed frontier models remain too large, too slow, and too power-hungry for the tactical edge.
Why Internet-Trained LLMs Underperform on Defense Tasks
The core mechanism is this: a model encodes what it was trained on. A model trained on the public internet encodes the public internet. Military doctrine, rules of engagement, operational planning terminology, sensor data interpretation, and classified context are largely absent from public training corpora. When a general-purpose LLM is asked to operate in those domains, it is drawing on a knowledge base that is almost entirely irrelevant to the task. The model does not know what it does not know, and it cannot discard the irrelevant knowledge, because that knowledge is baked into the same weights that hold everything else the model has learned. The result is a model that is larger, slower, and more power-hungry than the task requires, while simultaneously being less accurate within the specific domain where accuracy matters most.
Army Futures Command researchers at The Research and Analysis Center (TRAC) documented it directly in a 2024 technical study.[9] The TRAC team found that the current generation of general-purpose LLMs demonstrates sub-optimal performance on Army use cases specifically because of the prevalence of military domain vocabulary, jargon, and operational concepts that are absent from internet-sourced training data. To address this, they built a training corpus of over 4,300 Army doctrine documents and conducted iterative fine-tuning experiments. Each successive generation of their domain-trained model, called TRACLM, showed measurable improvement over the general-purpose baseline on Army-specific tasks. The study also developed MilBench, the first publicly documented Army-domain evaluation benchmark, specifically because standard AI benchmarks cannot measure the performance gap that domain mismatch creates. Standard benchmarks test general knowledge. Military operators need domain-specific accuracy.
The commercial defense AI industry has reached the same conclusion independently. Scale AI, building defense-specific models on the Scale Donovan platform for use in classified government environments, fine-tuned Meta's Llama 3 specifically for defense use cases and created the first DoD-domain evaluation benchmarks to measure performance.[17][18] The explicit reason, stated in their product documentation, is that general-purpose LLMs lack DoD domain-specific knowledge and writing style. They built new benchmarks because existing ones could not capture the gap. The Army Deputy Assistant Secretary for data made the same point more bluntly: using an LLM trained on the public internet for military tasks risks both operational inaccuracy and sensitive data exposure, because the model's training distribution was built for an entirely different user population.[19]
The implication for SLM development is direct. A model trained on curated, mission-relevant government data (military doctrine, operational planning records, domain-specific sensor interpretation guides, rules of engagement) will be smaller in parameter count, faster at inference, and more accurate on the tasks that actually matter than any general-purpose model given the same task. Smaller means it fits on the hardware a warfighter carries. Faster means it meets the decision timelines commanders require. More accurate means fewer errors where errors carry consequences. The dead weight argument is not about what LLMs cannot do in general. It is about what they carry unnecessarily when deployed in a domain where 99 percent of their training data is irrelevant to the task at hand.
Appendix D: Agent Chain Performance at the Tactical Edge
Three terms appear throughout this appendix that are worth defining upfront. An agent is an AI model assigned to one specific task, not a general assistant but a specialist. An agent chain is a sequence of agents working in order, where each one takes the previous agent’s output and acts on it. Inference is what happens when an agent processes a question or input and produces an output. Think of it as the AI equivalent of a person reading a document and writing a response. Every agent in a chain performs inference once per step, and inference takes time and consumes power. Understanding those three terms is enough to follow the argument in this appendix.
How Multi-Agent Workflows Function
In a multi-agent AI system, multiple AI models work together as a chain. Each model handles one specific task and passes its result to the next model in the sequence, similar to an assembly line where each station performs one function. A targeting workflow, for example, might chain 4 models in sequence: one that synthesizes incoming intelligence, one that classifies the threat, one that generates course of action options, and one that formats the output for a human decision-maker. Each step in the chain takes time and consumes computing power. The total time to complete the full chain is the sum of every individual model's response time, plus the time to pass data between steps.
The Compounding Problem with LLMs
A single frontier LLM at the Forward Base tier takes 1 to 5 seconds to begin responding and draws 10 to 20 kilowatts of power. Chain 4 of those models in sequence and the minimum time to complete the full workflow is 4 to 20 seconds, before accounting for the time to transfer data between steps or delays from competing requests. In a disconnected field environment, where those models must run on local hardware rather than in the cloud, the physical infrastructure required to run 4 large models simultaneously does not fit in any deployable form below a fixed, connected facility with dedicated generator power. The chain that looks powerful on paper becomes inoperable in the environment where it is actually needed.
Why SLMs Solve the Compounding Problem
A purpose-built SLM at the Tactical Edge tier responds in under 500 milliseconds (under half a second) and draws 30 to 60 watts. A 4-model chain of SLMs completes the full workflow in under 2 seconds, running on hardware a single operator can carry in a pack, drawing less power than a laptop. The multi-agent architecture remains fully intact. What changes is that each model in the chain is small, fast, and built specifically for its one function. That precision also improves accuracy: a model designed exclusively to classify threats from a specific sensor type will outperform a general-purpose model assigned the same task as one step among many.
The Design Implication
The choice between large-model and small-model agent chains is not a choice between capability and deployability. It is a choice between a system that works in a connected garrison environment and a system that works everywhere, including the places where the outcome of a conflict is actually decided. Technical leaders designing agent-based AI systems for defense use should calculate the total response time and total power draw of the full chain at the target deployment tier before selecting model sizes. A system that performs well in a connected test environment but fails its size, weight, and power budget in the field has not been designed for the mission. It has been designed for the demonstration.
Appendix E: Model Reference by Deployment Tier
This appendix lists representative AI models available for each deployment tier as of early 2026. Each entry notes the developer, the license type (which governs how the model can be used and modified), and any supply chain concerns. Tier parameter ranges are approximate deployment classes. Some models slightly outside the nominal range may fit when quantized and validated against target hardware. Two categories of models are excluded from this reference entirely: models developed by entities with documented ties to the Chinese Communist Party, and models designated adversary-linked by U.S. government reporting. DeepSeek and Qwen (Alibaba) fall into this category and are addressed in the supply chain section at the end of this appendix. Their exclusion applies even when the model weights are distributed as open-source downloads. Making weights publicly available does not eliminate the underlying risk, because the training process, embedded behaviors, and developer relationships remain intact regardless of where the files are hosted.
The models listed below are from U.S. developers, European developers (primarily Mistral AI, a French company), or other sources for which no widely reported adversary-linked concerns were identified in public sources. Operational deployment still requires normal supply-chain, licensing, security, and model-risk review. All have publicly available weights, meaning the model files can be downloaded, installed on local hardware, and operated without a cloud connection or ongoing relationship with the original developer. This is an important property for defense deployments: it means the model can run in a fully air-gapped environment with no internet access. Each entry also notes the license type; licensing terms should be reviewed with legal counsel before operational deployment, as some (particularly Meta's Llama license) impose restrictions at commercial scale.
Tier 1: Micro / Embedded (0.5B - 3B parameters)
These models run on the processors found in rugged tablets, handheld devices, and embedded systems, with no separate graphics card required. They respond in under 100 milliseconds (faster than a human blink) and require less than 5 watts of power. Best suited for single, well-defined tasks: identifying a threat category from a sensor feed, parsing a voice command, summarizing a short report, or extracting key terms from incoming data. They are not designed for complex multi-step reasoning.
Microsoft Phi-3 Mini (3.8B parameters). Open license, free to use and modify. Released June 2024. Built specifically for edge and mobile deployment. Performs well on reasoning tasks for its size. Runs without a dedicated graphics processor. A strong starting point for teams that want to train a custom version on government-specific data.
Microsoft Phi-4 Mini (3.8B parameters). Open license, free to use and modify. Released February 2025. Updated version of Phi-3 Mini with better ability to follow complex instructions. Runs on devices with 4 to 6 gigabytes of working memory, equivalent to a mid-range laptop. Well suited for tasks that require structured, formatted output such as reports or checklists.
Google Gemma 3 1B (1 billion parameters). Open license, commercial use permitted. Released March 2025. Designed specifically to run on a single device without cloud access. Text-only at this size. Extremely small memory footprint. Can run on hardware with as little as 2 gigabytes of working memory. The lightest viable option for the most constrained hardware environments.
Google Gemma 3 4B (4 billion parameters). Open license. Released March 2025. Handles both text and images, which is useful for applications that need to process photographs, maps, or sensor imagery alongside written input. Runs on smartphone-class hardware. Strong multilingual performance across major languages.
Meta Llama 3.2 1B and 3B (1 and 3 billion parameters). Community license; review terms before large-scale deployment. Released September 2024. The smallest members of Meta's widely-used Llama model family. Capable of basic reasoning and instruction following. Widely used by defense and research organizations as a baseline for building custom, domain-trained versions.
Mistral Ministral 3B (3 billion parameters). Open license, free to use and modify. Released October 2024. Developed by Mistral AI, a French company with no adversary-linked concerns. Designed specifically for edge deployment. Performs competitively against comparably sized models on standard public benchmarks. Responds in under half a second on standard consumer hardware.
Tier 2: Tactical Edge (3B - 8B parameters)
These models run on a small, dedicated AI processor, roughly the size of a hardback book that weighs under 15 pounds and draws 30 to 60 watts of power. The NVIDIA Jetson AGX Orin is the current benchmark device in this category; ruggedized military-grade equivalents are available. These models respond in under half a second and can handle more complex reasoning than Tier 1 models: multi-step analysis, synthesizing information from multiple sources, supporting targeting decisions, and running as part of automated agent workflows. All of this operates without any network connection.
Meta Llama 3.1 8B (8 billion parameters). Community license. Released July 2024. Strong all-around performance. One of the most widely customized models in the defense and research community, with numerous domain-specific versions already developed. Runs well on Jetson-class edge hardware. A practical starting point for teams building a custom model trained on government-specific data. The instruction-tuned version is recommended for automated agent workflows. Supports a 128k-token context window.
Mistral 7B (7 billion parameters). Open license, free to use and modify. Released September 2023. One of the most widely deployed open-weight models in the world. Strong performance for its size. Uses an efficient architectural design that reduces memory requirements compared to models of similar capability. Fully permissive license with no commercial restrictions.
Mistral Ministral 8B (8 billion parameters). Open license. Released October 2024. Designed specifically for edge deployment. Performs competitively with Meta's Llama 3.1 8B on several public benchmarks. Includes built-in support for tool use, meaning the ability to invoke external functions, query databases, or call other software systems as part of its reasoning, which is essential for automated agent workflows. Recommended for multi-agent tactical applications.
Microsoft Phi-4 (14 billion parameters). Open license. Released December 2024. Outperforms many models twice its size on reasoning and structured tasks. Fits within the memory limits of Jetson-class edge hardware when compressed using quantization (see Appendix C). Well suited for tasks that require careful, step-by-step reasoning under field conditions.
Google Gemma 3 9B (9 billion parameters). Open license. Released March 2025. Handles both text and images. Performs comparably to Llama 3.1 8B across most standard evaluations. Designed for efficient local operation without cloud access.
Tier 3: Vehicle / Mounted (8B - 13B parameters)
These models run on a ruggedized compute module mounted in a vehicle bay, drawing 100 to 400 watts from the platform’s onboard power supply. Suitable for crew-served ground vehicles, rotary and fixed-wing aircraft, surface vessels, and larger UAS platforms. They respond in under 1 second and handle more complex reasoning than backpack-class hardware: multi-source sensor fusion, targeting support, and autonomous system coordination. All operate without any network connection. China’s denial strategy targets data links and fixed infrastructure; a vehicle-mounted compute node operates independently of both.
Microsoft Phi-4 (14 billion parameters). Open license. Exceptional reasoning for its size. Runs on a single high-end edge module or compact GPU node. Well suited for structured analytical tasks including targeting decision support and sensor data interpretation under field conditions.
Meta Llama 3.1 8B (8 billion parameters). Community license. Strong all-around performance with wide adoption across defense and research organizations. Runs well on Jetson-class hardware in a ruggedized vehicle enclosure. A practical starting point for domain-trained vehicle-mounted applications.
Mistral Ministral 8B (8 billion parameters). Open license. Designed for edge deployment with built-in tool use support, enabling the model to invoke external functions and query onboard systems as part of its reasoning. Recommended for vehicle-mounted multi-agent applications.
Tier 4: Forward Base / Shelter (13B - 70B parameters)
These models require 2 or more high-end server processors (graphics processing units, the specialized chips that power AI workloads) along with active cooling systems. The full hardware setup weighs 300 to 600 pounds and requires generator power, placing it in a hardened shelter or vehicle at a forward operating base. Not field-portable. Response times of 1 to 5 seconds. Suited for more complex analysis at the forward edge: synthesizing intelligence from multiple documents, generating course of action options, processing large volumes of incoming data, and supporting planning functions that need more depth than the tactical tier models can provide.
Meta Llama 3.1 70B (70 billion parameters). Community license. Released July 2024. Strong all-around performance. Can process approximately 100,000 words of input at once, enabling analysis of large documents or extended operational logs. Requires 2 high-end server processors at full precision; can be compressed to run on a single processor. One of the most widely customized models in use across government and research, with numerous domain-specific versions available.
Meta Llama 3.3 70B (70 billion parameters). Community license. Released December 2024. Improved performance over Llama 3.1 70B at similar hardware cost. Recommended over the 3.1 version for new deployments.
Mistral Large 3 (675 billion parameters total; 41 billion active per query). Open license. Released December 2025. Uses a Mixture of Experts architecture: rather than activating all 675 billion parameters for every query, the model routes each request to the most relevant subset (approximately 41 billion parameters), reducing the computing cost significantly. Think of it as a team of specialists where only the relevant experts are called in for each task, rather than the entire team attending every meeting. Supports over 80 languages. Recommended for multilingual intelligence analysis. Fully permissive license with no commercial restrictions.
Mistral Small 4 (119 billion parameters total; approximately 6 billion active per query). Open license. Released March 2026. Uses the same Mixture of Experts routing approach as Mistral Large 3, but with even fewer active parameters per query, making the actual computing cost similar to running a 6 billion parameter model despite the much larger total size. Handles text, images, and complex reasoning in a single model. Supports a 256k-token context window. Excellent capability relative to its operating cost, making it one of the best options for Forward Base deployments where hardware is constrained but analytical depth is needed.
Microsoft Phi-4 (14 billion parameters). Open license. Released December 2024. The smallest model in this tier and the most deployable. Can run on a single server processor and, when compressed, even on high-end consumer hardware. Exceptional reasoning for its size. A strong option for smaller forward positions that have generator power but limited hardware capacity.
Cohere Command A (111 billion parameters). Non-commercial open license. Released March 2025. commercial deployment requires a separate agreement with Cohere. Optimized for searching and reasoning across large document collections, specifically the kind of task where an operator needs to query a library of doctrine, reports, or intelligence records and receive synthesized answers. Supports a 256k-token context window. Runs on 2 high-end server processors.
Tier 5: Garrison / FOB Datacenter (200B - 700B parameters)
These models require 8 to 16 high-end server processors, dedicated data center facilities, and active cooling systems. The hardware weighs over 1,000 pounds before racks, networking, and power infrastructure are accounted for. Not deployable to the field under any practical scenario. These models are the right tool for the most demanding analytical tasks in connected garrison environments: campaign planning, theater-level logistics optimization, large-scale intelligence fusion, and wargaming support where reasoning depth and breadth matter more than speed or portability.
Meta Llama 3.1 405B (405 billion parameters). Community license. Released July 2024. The largest of Meta's Llama 3.1 family. Strong performance on complex, multi-step reasoning tasks. Requires 8 high-end server processors to run at full capability. Widely used across government and research environments and available through several approved cloud platforms.
Meta Llama 4 Scout (109 billion parameters total; 17 billion active per query). Community license. Released April 2025. Uses Mixture of Experts routing, so the active computing cost is much lower than the total parameter count. Supports a 10M-token context window, making it suited for analyzing entire document libraries, extended operational logs, or very long planning sequences in a single query. Full-context inference imposes significant KV-cache memory demands and should be validated against available hardware. Recommended for long-document analysis and extended reasoning at the garrison level.
Meta Llama 4 Maverick (400 billion parameters total, MoE). Community license. Released April 2025. Meta's highest-performance open-weight model. Uses Mixture of Experts routing to reduce active compute per query despite the large total size. Suited for the most demanding garrison-level analytical workloads where output quality is the primary requirement.
Mistral Large 3 (675 billion parameters total; 41 billion active). Open license. Also viable at this tier for multilingual and multi-domain analytical tasks. Lower hardware requirements than the 400B+ models above, making it a practical choice when full garrison infrastructure is not available but Tier 4 forward shelter hardware is insufficient for the task.
xAI Grok-1 (314 billion parameters total; approximately 25 percent active per query). Open license. Released March 2024. Developed by xAI, a U.S. company. Released with publicly available model files. Performance is competitive with other models of similar active parameter count. Fully open license with no commercial restrictions.
Tier 6: Frontier LLM / Cloud-Only
These models run exclusively in large commercial data centers and are accessed over the internet. They cannot be downloaded, installed locally, or operated without a persistent network connection. They are not viable in contested or disconnected environments. Their value lies in connected garrison and national-level settings where the priority is the highest possible analytical quality: breadth of knowledge, depth of reasoning, and handling of complex, open-ended tasks, rather than portability, speed, or power efficiency.
OpenAI GPT-4 class models (GPT-4o, GPT-4.5, GPT-5 series). Proprietary, API access only. Not open-weight. U.S. origin. Appropriate for connected garrison environments, strategic analysis, and policy work.
Anthropic Claude (Sonnet, Opus series). Proprietary, API access only. Not open-weight. U.S. origin. Strong on long-context reasoning, document analysis, and code. Appropriate for the same connected garrison use cases.
Google Gemini (Pro, Ultra series). Proprietary, API access only. Not open-weight. U.S. origin. Strong multimodal capability. Appropriate for connected analytical and planning environments.
Microsoft Copilot / Azure OpenAI Service. Proprietary. U.S. origin. Enterprise integration with Microsoft productivity suite. Appropriate for garrison staff work and acquisition functions.
Supply Chain Risk: Models to Exclude
The following models should not be used on Department systems or in Department-connected applications, regardless of how they are distributed. A critical point for decision-makers: the fact that a model's files are freely downloadable and can be run on local hardware does not eliminate the supply chain risk. The risk is not primarily in how the model is hosted. It is in how the model was built. A model's training process, the data it was trained on, the behavioral restrictions embedded in it by its developers, and the developer's relationships with foreign governments are all properties of the model itself, not properties of the server it runs on. Downloading the files and running them locally does not change any of those properties.
DeepSeek (all versions, including R1, V2, V3, and derivatives). Developed by High-Flyer Capital Management, Hangzhou, China. Banned from government devices by the Pentagon, Navy, NASA, Department of Commerce, Department of Energy, and multiple federal agencies [19]. Congressional Select Committee on the CCP designated DeepSeek a profound threat to U.S. national security [20]. Bipartisan legislation introduced to codify the ban. Contains obfuscated code with documented connections to China Mobile infrastructure. Excluded from use on Department-connected systems absent explicit authorization.
Qwen (all versions, including Qwen 2.5, Qwen 3, and derivatives). Developed by Alibaba Cloud. Pentagon briefly listed Alibaba on its list of companies with alleged links to the Chinese military. NIST/Commerce CAISI report (October 2025) designated Qwen adversary AI [21]. Financial Times reporting (November 2025) alleged Alibaba provides technology support for Chinese military operations [22]. Data infrastructure subject to Chinese national security law requiring disclosure to government authorities on request. For Department-connected systems, Qwen should be treated as a presumptive exclusion candidate unless explicitly authorized through formal risk review.
Any model developed by ByteDance (owner of TikTok), Tencent, Baidu, Huawei, or other entities designated as Chinese military-linked should be treated with the same exclusion standard. The No Adversarial AI Act, introduced in Congress in June 2025, would formalize restrictions on AI tools developed by China, Russia, Iran, and North Korea for federal use. Pending formal regulatory guidance, the standard should be applied proactively: if the developer has documented ties to an adversary government, the model is excluded regardless of its technical performance.
Note for technical teams: downloading and self-hosting the open-weight files of any excluded model does not resolve the supply chain concern. The concern is not primarily about where the model runs. It is about what the model is. A model's training data, the behavioral guardrails built into it by its developers, and any vulnerabilities that may have been introduced during training are all properties of the model's weights. Those properties travel with the files. Hosting the files on a government server does not change them. The weights are the risk, not the server.
Appendix F: Defense AI Startup Landscape
This appendix documents the emerging landscape of startups building domain-specific, field-deployable AI for defense as of April 2026. It is not exhaustive. The space is moving quickly and many relevant companies operate below public visibility. Entries are limited to companies with publicly documented products, contracts, or funding that directly address the architecture arguments in this article. The list is organized by relevance to tactical edge deployment specifically.
Tier 1: Tactical Edge and Air-Gapped Focused
These companies are building specifically for denied, disconnected, and constrained environments. Their products are designed to run air-gapped on organic hardware, trained on domain-specific military data, and operate without persistent cloud connectivity. They are building exactly what this article argues the Department needs. None has a funded program of record at scale as of April 2026.
EdgeRunner AI (Seattle, WA)
Veteran-founded. Domain-specific, air-gapped, on-device AI agents trained on military doctrine and real combat scenarios by former operators. Product: WarClaw, an agentic AI assistant that runs fully air-gapped on organic hardware with no internet connection required. MOS-specific assistants across all devices and assets. Public beta launched July 2025. Contracts with Army Special Forces groups and Space Force as of early 2026. Following the Anthropic supply chain designation in March 2026, the Navy accelerated engagement significantly. The company explicitly trains on curated military data rather than general internet corpora.
Smack Technologies (El Segundo, CA)
Founded 2024 by two MARSOC veterans (Andrew Markoff and Clint Alanis). $32 million Series A closed March 2026, led by Geodesic Capital and Costanoa Ventures. Self-described as the first frontier AI lab for national security. Product: Omega stack, using deep reinforcement learning trained on combat-relevant datasets to convert commander intent into executable plans. Reduces operational planning from months to 15 minutes. Contracts with Marine Corps, Navy, Air Force, and SOCOM. Marine Corps prototype delivered October 2025; production timeline accelerated by more than a year following the Anthropic situation. 19 employees as of March 2026.
TurbineOne
Founded 2022. Purpose-built for national security from day one. $54 million total funding across three rounds ($36 million Series B, May 2025, led by Bessemer Venture Partners). Product: Frontline Perception System, which brings AI to the sensor at the edge — running on everyday military gear including heads-up displays and autonomous drones without cloud access. Designed for speed, simplicity, and reliability in combat without technical training requirements. AFWERX SBIR recipient. DIU partnership. Former Army Acquisition General on advisory board. 68 employees as of early 2026.
JARVIS Defense (jarvisdefense.ai)
Multi-agent AI platform for DDIL military operations. Fully air-gapped, no cloud dependency. Describes its product as real-time actionable intelligence at the tactical edge without latency or compromise. Stated relationships with Army, Navy, and Air Force. Limited public funding information available as of April 2026. Included here for completeness given its explicit air-gapped, denied-environment focus.
Tier 2: Defense AI Broader Context
These companies are building defense AI capabilities that are relevant to this article’s context but are not primarily focused on the tactical edge SLM tier specifically. They are included because they represent the broader ecosystem the Department is drawing from.
NODA AI
$25 million Series A, February 2026, led by Bessemer Venture Partners. AI orchestration platform for autonomous defense solutions. More enterprise and connected-environment focused than pure tactical edge.
Rebellion Defense
$224 million total funding. AI and machine learning solutions for US and UK military and civil service. Primarily enterprise and signals intelligence focused rather than tactical edge SLM deployment.
What This Landscape Tells the Acquisition Community
Three observations for PAEs and acquisition leaders reading this appendix.
First, venture capital is building what the Department hasn’t committed to fund. Every company in Tier 1 of this appendix exists because operators and engineers identified the gap and built toward it without a Department demand signal strong enough to sustain them. Venture capital can seed a concept. It cannot build a defense architecture or sustain a company through a multi-year program of record. These companies need a buyer signal, not just contract experiments.
Second, the Anthropic situation accelerated interest but didn’t create demand. The surge in contract activity and investor interest following the Pentagon’s March 2026 supply chain designation of Anthropic brought these companies into rooms they had been waiting to enter for more than a year. That acceleration is real but reactive. Procurement driven by a supply chain crisis is not the same as procurement driven by an architecture requirement. The Department needs to make the latter the basis for its decisions.
Third, the industrial base is thinner than it looks. A handful of small companies with fewer than 100 employees each cannot build a joint defense AI architecture. The Department cannot contract its way to capability at this scale using only the companies in this appendix. It needs to create the demand signal that pulls a generation of new entrants into this space — and the acquisition structures that allow nontraditional vendors to compete for it.
Appendix G: Glossary
Key technical terms used in this article and its appendices, listed alphabetically.
A2/AD (Anti-Access/Area Denial) - China's strategy of denying US forces entry into and freedom of movement within the Indo-Pacific theater, using long-range missiles, electronic warfare, and other systems specifically designed to sever the communications infrastructure AI-enabled operations depend on.
Agent / AI Agent - A software system that uses an AI model to take autonomous actions on behalf of a user, executing multi-step tasks rather than just answering questions.
Agentic AI / Multi-Agent Workflow - An architecture in which multiple AI agents work in a chain, each handling one specialized task and passing its output to the next.
Air-Gapped - A system completely disconnected from external networks, including the internet and cloud services, capable of operating without any connectivity.
Apache 2.0 / MIT License - Open-source licenses that permit free use, modification, and deployment of a model, including on government systems, without ongoing licensing costs or vendor dependency.
Context Window - The maximum amount of text a model can process in a single query; larger windows allow synthesis of more information but require more compute.
DDIL (Disconnected, Degraded, Intermittent, and Limited-bandwidth) - Operating environments where network connectivity can’t be assumed. DDIL conditions are the baseline planning assumption for tactical edge AI deployment in contested theaters.
Fine-Tuning - The process of continuing a model's training on a curated domain-specific dataset so it performs more accurately on targeted tasks — the mechanism by which a general-purpose model becomes a purpose-built SLM.
GPU (Graphics Processing Unit) - The specialized processor that powers AI workloads; GPU count is a proxy for hardware scale: one GPU means edge hardware, eight or more means data center infrastructure.
Inference - The process of using a trained AI model to generate a response; the compute cost, power draw, and response time figures in this article all refer to inference, not training.
Kill Chain - The military decision cycle: Find, Fix, Track, Target, Engage, Assess (F2T2EA); AI-assisted kill chain operations use models to accelerate analysis and targeting steps.
Large Language Model (LLM) - A general-purpose AI model trained on large internet-sourced datasets, capable of broad tasks in connected environments but too large, power-hungry, and cloud-dependent for the tactical edge.
Mixture of Experts (MoE) - An architecture that routes each query to only the most relevant subset of a model's parameters, making actual compute cost significantly lower than total parameter count suggests.
Model Weights / Open-Weight Model - The numerical values that define what a model knows; open-weight models release these files publicly, enabling air-gapped deployment, customization on classified data, and operation without vendor dependency.
NPU (Neural Processing Unit) - A low-power processor optimized for AI inference, enabling small models to run on handheld hardware at under 10 watts.
Parameters - The numerical values that make up a model and define its capabilities; parameter count is the standard measure of model size, and every SWaP dimension scales with it.
Quantization - A compression technique that reduces model memory requirements by 50 to 75 percent with modest accuracy tradeoff; useful for fitting SLMs onto edge hardware but insufficient to bring frontier-scale models within tactical SWaP constraints.
Small Language Model (SLM) - A purpose-built AI model trained on a narrow, domain-specific dataset to run on constrained hardware without cloud connectivity, trading breadth for speed, deployability, and precision within its domain.
Supply Chain Risk - Threats originating in how a model was built rather than how it is deployed, including adversary-influenced training data, embedded behavioral restrictions, and developer obligations under foreign government law.
SWaP (Size, Weight, and Power) - Size, Weight, and Power; the defining hardware constraint at the tactical edge that ranks above features and benchmark performance as a deployment criterion.
Tokens / Token Rate - A token is roughly one word; token rate measures how fast a model generates output, with higher rates indicating faster, more tactically viable response times.
Training - The computationally intensive process of building a model by exposing it to large data and adjusting its parameters; distinct from inference and typically done once or periodically to build or update a model.
REFERENCES
1. Secretary Pete Hegseth. "Artificial Intelligence Strategy for the Department of War." U.S. Department of War memorandum, January 9, 2026.
2. Andrew Park. "Oceans and Orbits: The Real Future of U.S. Defense Modernization." LinkedIn, March 3, 2026.
3. U.S. Department of the Air Force. "Department of the Air Force Artificial Intelligence Strategy." Chief Data and AI Office, April 17, 2026.
4. Christian Brose. The Kill Chain: Defending America in the Future of High-Tech Warfare. Hachette Books, 2020.
5. NVIDIA Corporation. "DGX H100 System Specifications." NVIDIA Documentation, 2023.
6. NVIDIA Corporation. "Jetson AGX Orin: Benchmarks and Technical Specifications." NVIDIA Developer, 2024.
7. Mayank Arya and Yogesh Simmhan. "Understanding the Performance and Power of LLM Inferencing on Edge Accelerators." arXiv:2506.09554, June 2025.
8. MLCommons. "MLPerf Inference v5.0: LLM Benchmark Design for Large-Scale and Low-Latency Workloads." MLCommons, April 2025.
9. Daniel Ruiz et al. (Army Futures Command / The Research and Analysis Center). "Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain." arXiv:2410.20297, October 2024.
10. Capt. Zachary Szewczyk, U.S. Army. "The Military Needs Frontier Models." Military Review Online Exclusive, August 2025.
11. U.S. Army Public Affairs. "Harnessing AI for the Future: Army Unveils Project ARIA." U.S. Army, March 5, 2026.
12. Patrick Tucker. "Startup Debuts Agentic AI Assistant for War." Defense One, April 1, 2026.
13. CNBC. "Anthropic Officially Told by DoD That It's a Supply Chain Risk." March 5, 2026.
14. Smack Technologies. "Smack Technologies Announces $32M in Funding to Build First Frontier AI Lab for National Security." Business Wire, March 2, 2026.
15. TurbineOne. "TurbineOne Raises $36M Series B to Deploy AI at the Tactical Edge." Business Wire, May 14, 2025.
16. Patrick Tucker. "Meet the Startups Trying to Build Military-Specific AI." Defense One, March 8, 2026.
17. Scale AI. "Defense LLM: Fine-tuned large language models for national security missions." Scale Donovan product documentation, 2024.
18. Lauren C. Williams. "Army set to issue new policy guidance on use of large language models." DefenseScoop, May 9, 2024.
19. Inside Government Contracts. "U.S. Federal and State Governments Moving Quickly to Restrict Use of DeepSeek." February 17, 2025.
20. Foundation for Defense of Democracies. "Defending Against DeepSeek: Congress' Federal Firewall." July 18, 2025.
21. National Institute of Standards and Technology / Center for AI Standards and Innovation (CAISI). Adversary AI Assessment Report. Department of Commerce, October 2025.
22. Taipei Times. "US Cannot Win by Targeting Alibaba." November 25, 2025, citing Financial Times reporting on White House memo regarding Alibaba and Chinese military technology support.
23. U.S. Department of Defense. "Annual Report to Congress: Military and Security Developments Involving the People's Republic of China 2025." December 23, 2025.
24. Reuters. "Pentagon to adopt Palantir AI as core US military system, memo says." March 20, 2026.
25. U.S. Space Force. "Space Threat Fact Sheet." May 2025.
26. Erin L. Murphy and Matt Pearl. "China's Underwater Power Play: The PRC's New Subsea Cable-Cutting Ship Spooks International Security Experts." Center for Strategic and International Studies, April 4, 2025.
