Small Language Models and Composable Agents

Mar 11

Artificial intelligence has entered a new era. For the past few years, the race has been about making models bigger GPT-5, Gemini, Claude, Llama-3, and others have billions (or even trillions) of parameters. These large language models (LLMs) are incredible, but they aren’t always the most efficient or practical.

The next frontier is different: small language models (SLMs) and composable agents. Instead of one monolithic system doing everything, we are moving toward many smaller, specialized, cooperating intelligences.

Why Small Language Models Matter

Large models showcase raw capability, but they face practical barriers:

High compute costs — training and inference require massive GPUs/TPUs.
Latency — not ideal for real-time systems (like wearables or robots).
Energy consumption — too high for edge or mobile deployment.
Data privacy — sensitive data often must be sent to the cloud.
Complex fine-tuning — domain-specific adaptations are expensive.

By contrast, small language models (hundreds of millions to a few billion parameters):

Run locally (on a laptop, phone, or IoT device).
Are energy efficient and low-latency.
Can be fine-tuned cheaply for niche use cases.
Enable privacy-preserving AI (data never leaves the device).

In short, SLMs are not weaker versions of LLMs, they are fit for purpose specialists.

Enter Composable Agents

A language model is just a prediction engine. An agent goes further: it can perceive context, act, and interact with other systems.

An agent typically includes:

A model (SLM or LLM)
A memory/context store
Access to tools or APIs
Rules or policies for decision-making

When agents are composable, they can be assembled like Lego blocks into larger systems.

Instead of one giant model trying to do everything, you can plug together specialist agents to solve complex workflows.

Communication Between Agents

Composable agents need to talk to each other. Two common paradigms:

MCP (Model Context Protocol): Agents talk to tools (databases, APIs, sensors).
A2A (Agent-to-Agent): Agents talk directly to other agents.

IMAGE 2

The Power of Combining SLMs and Composable Agents

Together, SLMs and composable agents unlock new possibilities:

Scale-out Intelligence — Many small agents cooperating beats one giant model.
Resilience — If one agent fails, others continue.
Efficiency — Each agent runs only where needed (cloud, edge, sensor).
Privacy & Security — Data can be processed locally before sharing.

Example Workflow in Healthcare

Speech agent → transcribes doctor-patient conversation (SLM on device).
Summarization agent → condenses key symptoms (local/cloud hybrid).
Clinical reasoning agent → checks symptoms against medical database.
Workflow agent → updates patient record automatically

From Monolithic AI to Modular Intelligence

For decades, software has evolved from monoliths to microservices.
AI is undergoing the same transition.

Monolithic AI (LLM only):
One giant model does everything, inefficient and hard to control.
Modular AI (SLMs + Agents):
Many specialized models, each handling a part of the task, communicating through protocols.
IMAGE3

Closing Thoughts

The age of trillion-parameter LLMs isn’t ending, but it’s becoming clear that the future of AI will be distributed, modular, and composable.

SLMs bring efficiency and privacy.
Composable agents bring modularity and scalability.

Instead of asking “How big can we make AI models?” we should ask:

“How can we make smaller, smarter, cooperating AI systems that solve problems together?”

The future isn’t just large intelligence, it’s modular intelligence.

shankar kuchibhotla

Small Language Models and Composable Agents

An Architectural Deep Dive of Production Grade RAG Systems

Techniques for Optimizing AI Models