Your Excel Spreadsheet Is Fine

LLMs vs. Traditional Rules Engines

When to Use LLMs vs. Traditional Rules Engines

There is a sentence more technology teams should be able to say without embarrassment:

You do not need AI for this.

And there is a second sentence that matters almost as much:

You definitely do not need an LLM for this.

That may sound unfashionable in the current market, but it is one of the healthiest instincts a company can develop.

Because a lot of organizations are making the same mistake in slightly different forms: they assume that once a workflow touches language, documents, ambiguity, or internal decision-making, the next logical move is to wrap it in an LLM.

Sometimes that is the right call.

A lot of the time, it is not.

Sometimes the best solution is a spreadsheet with ownership. Sometimes it is a workflow engine. Sometimes it is a rules engine with gloriously boring logic. Sometimes it is a hybrid system where an LLM helps interpret messy inputs while deterministic logic governs the actual decision.

Those distinctions matter because the real question is not whether a model can do something. The real question is whether using a model is the most reliable, economical, controllable, and strategically sensible way to solve the problem.

That is a harder question. It is also the one that separates mature systems design from trend-chasing.

The trap: treating LLMs like the default upgrade path

One of the odd distortions of the current moment is that many teams now treat LLMs as a kind of modern software frosting. If there is a workflow problem, the instinct is to spread a model over the top of it.

It sounds progressive. It feels current. It creates the impression of sophistication.

But “AI-enabled” is not a design principle.

A workflow does not become better merely because a language model has been inserted into it. In fact, LLMs often make systems worse when they are used in places where consistency, explicitness, auditability, and control matter more than flexibility.

That is why one of the most useful questions a team can ask before building anything is a slightly unglamorous one:

Is this actually a language problem, or is it a logic problem?

If it is mostly a logic problem, deterministic systems deserve much more respect than they usually get.

Rules engines are underrated because they are boring

Rules engines do not photograph well.

There is no magic feeling when someone demonstrates them. No one gasps because a rules engine correctly routed an invoice for approval based on threshold, department, and vendor type. No one writes breathless thought-leadership posts about explicit validation logic. No executive leans back in wonder because a system applied policy consistently and left a clean audit trail.

And yet a shocking amount of operational value is produced by exactly those kinds of systems.

Rules engines are powerful because they do something LLMs often cannot do as cleanly: they make stable decisions in a fully explicit way.

They are strong when:

the input is structured or can be normalized
the business logic is known
the cost of inconsistency is high
auditability matters
exception conditions can be defined
outcomes should be predictable

In those conditions, rules engines are not primitive. They are appropriate.

Sometimes, they are superior.

LLMs are powerful because work is often messier than the rules

Of course, not all workflows are nicely structured.

Many are full of ambiguity, linguistic variation, hidden intent, incomplete context, poorly formatted material, exceptions that humans handle intuitively, and judgments that are difficult to encode cleanly in advance.

That is where LLMs become genuinely useful.

They are strong when the work requires:

reading and interpreting natural language
comparing nuanced wording
summarizing long material
extracting signals from messy inputs
handling a wide range of user phrasing
classifying soft categories
supporting decisions where ambiguity is unavoidable

The trap is not using LLMs.

The trap is using them for jobs that were never meaningfully ambiguous in the first place. When teams do that, they often trade clarity for fashion.

The most useful dividing line: judgment vs. consistency

If there is one framing worth keeping in your head, it is this:

Rules engines are usually for consistency. LLMs are usually for judgment support.

That is not a perfect binary, but it is a very useful starting point.

If the workflow requires the same answer under the same conditions every time, you are often in rules territory.

If the workflow involves interpreting language, weighing subtle differences, or handling messy variation that resists formalization, you may be in LLM territory.

That distinction alone clarifies a surprising number of design decisions.

Before choosing the tool, map the real process

This is where many organizations move too quickly.

They compare tools before they understand the workflow. They debate LLMs versus rules engines before they have properly mapped how the work is actually being done. And that usually leads to bad architecture, because the abstract version of a process is rarely the real one.

Do not trust the slide-deck version. Talk to the people doing the work. Learn the exceptions, workarounds, hidden dependencies, judgment calls, missing information, and informal handoffs.

A workflow that looks deterministic from a distance may turn out to depend on unspoken human interpretation. A process that seems like a good LLM candidate may actually reveal a stable decision structure once the noise is stripped away. And sometimes the exercise surfaces a third possibility entirely: the problem is not tool choice yet — it is that the process itself has never been clearly defined.

This is one of the most valuable things organizations can do before adopting AI: understand the real shape of the work.

Because only then can you answer questions like:

Should the system retrieve, summarize, classify, draft, route, recommend, act, or escalate?
Which parts require interpretation?
Which parts need strict control?
Where should humans stay in the loop?
What would users need to trust the system?
How would success actually be measured?

That is not bureaucracy. That is design.

A more honest framework for choosing between them

The usual comparison between rules engines and LLMs is too shallow. People tend to compare sophistication of output or speed of prototyping. That misses the real tradeoffs.

A better framework looks at seven dimensions.

1. Input variability

How messy are the inputs?

If the inputs are highly structured, deterministic systems move up the list. If the inputs are unstructured and varied, LLMs become more attractive.

2. Logic clarity

Can the decision logic be stated explicitly?

If the answer is yes, then an LLM may be adding unnecessary uncertainty. If the answer is no, then forcing the workflow into brittle rules may create more maintenance pain than it solves.

3. Cost of inconsistency

How bad is it if the same case gets treated slightly differently from one day to the next?

If inconsistency is costly, rules engines become much more appealing. If the workflow tolerates interpretation or is inherently fuzzy, LLMs may still make sense.

4. Auditability and control

Do you need to explain exactly why a decision was made? Do you need clear enforcement of policy? Will someone need to reconstruct the logic later?

If yes, deterministic systems often have a structural advantage.

5. Preparation cost

This is where the conversation gets more interesting.

Rules engines often demand more formalization up front. You need to know the rules, thresholds, exceptions, field mappings, and ownership model. That can feel like friction.

LLMs often feel faster to start. You can put messy material in front of them and see useful behavior almost immediately.

But that convenience can be deceptive. What looks easier at the beginning may create a larger burden later in evaluation, prompt tuning, error analysis, observability, human review, and trust-building.

So one useful way to think about the tradeoff is this:

rules engine = more onramp effort, less ambiguity later
LLM = faster onramp, more operational ambiguity later

That does not make one better in the abstract. It simply means the costs arrive at different stages.

6. Immediate need vs. long-term operating model

Sometimes you need something useful quickly. Sometimes you need something that will scale cleanly for years. Those are not always the same design choice.

An LLM might get you helpful triage or extraction behavior fast, especially in messy environments. A rules engine might take longer to define, but create a much more stable backbone once the logic is clear.

7. Trust requirements

What kind of confidence do users, managers, auditors, or customers need from the system?

Some workflows can tolerate fuzziness. Others cannot. Some need transparent rationale, explicit controls, review paths, and clean audit behavior. The stronger the trust requirement, the more important it becomes to design boundaries carefully — and in many cases, that favors deterministic control around probabilistic reasoning. This echoes the “design the trust layer as carefully as the intelligence layer” idea from your process notes, and it belongs here naturally.

The simplest version of the framework

Use a rules engine when:

the logic is stable
the data is structured
errors need to be highly predictable
you need clean auditability
the workflow is repetitive and formalizable

Use an LLM when:

the inputs are messy or language-heavy
the task requires interpretation
summaries, extraction, classification, or judgment support matter
user phrasing or document variation creates too many edge cases for deterministic logic alone

Use both together when:

you want deterministic control around probabilistic reasoning
the workflow needs interpretation first and enforcement second
trust, review, or compliance requires explicit boundaries

That third category is often the real answer.

Case 1: invoice processing is usually a rules-engine problem

Let’s take a common example.

An invoice arrives. The organization needs to:

extract standard fields
validate totals
check whether the amount exceeds a threshold
route it for approval
apply vendor or department-specific logic
log what happened

This is classic rules-engine territory.

Why?

Because the business logic is largely explicit:

if amount is above threshold, require approval
if vendor is new, flag for review
if PO is missing, hold
if department is X, route to Y
if invoice date is outside acceptable range, reject

You may still use OCR or AI-assisted extraction upstream to structure messy documents. But once the data is normalized, the decision layer should usually be deterministic.

Why hand stable approval logic to a probabilistic system if you do not need to?

That is not sophistication. That is avoidable risk.

Case 2: contract review is often a better LLM problem

Now compare that with contract review.

A contract is not simply a set of clean fields. It contains negotiation language, nonstandard clause structures, implied obligations, unusual phrasing, deviations from preferred wording, and subtle differences that carry real legal or commercial consequences.

This is where LLMs become genuinely useful.

They can:

identify relevant clauses
summarize obligations
compare language against a preferred template
flag deviations
generate a first-pass risk memo
surface missing or ambiguous terms

Why do LLMs work better here?

Because the burden is interpretive. The problem is not merely applying static rules. The problem is reading and reasoning over language.

That does not mean the system should run unbounded. A serious contract workflow still benefits from:

clause libraries
risk taxonomies
approval thresholds
escalation rules
human legal review

Which is exactly why hybrid architectures are often the real answer.

The underappreciated middle ground: LLMs as an onramp to formalization

There is another pattern worth acknowledging because it is common in real organizations.

Sometimes the business does not yet understand its own rules well enough to build a robust rules engine. The workflow is real, but the process is half-documented, exceptions live in people’s heads, and decision logic is scattered across email threads, tribal knowledge, and habitual judgment.

In that environment, insisting on a fully deterministic design too early can stall progress.

An LLM can sometimes serve as a practical bridge. It can help:

classify messy incoming cases
surface recurring exception patterns
expose hidden workflow variation
reveal where language or categorization is unstable
support teams while the workflow is being formalized

In other words, LLMs can sometimes be useful not because they are the final architecture, but because they help the organization understand what the final architecture should be.

That is a much more mature role than “the model will solve everything.”

Why hybrid designs usually win in the real world

The most mature systems often combine both approaches.

They use LLMs where interpretation is genuinely valuable, and deterministic logic where control actually matters.

That might look like:

LLM extracts and summarizes
rules engine validates and routes
humans review exceptions

Or:

LLM compares contract language
policy rules assign risk levels or required approvals
legal signs off before action

Or:

LLM classifies inbound support requests
workflow rules decide ownership, SLA path, and escalation
humans handle sensitive or uncertain cases

This is usually where operational sanity lives.

Not in ideology. Not in “everything should be AI.” Not in “rules are outdated.”

But in choosing the right level of flexibility for the right layer of the workflow.

A practical scoring model

If you want a simple working method, score a workflow on these dimensions from low to high:

input variability
interpretation required
cost of inconsistency
need for auditability
clarity of business rules
urgency of deployment
expected maintenance burden

Then look at the shape.

If interpretation is high, inputs are messy, and rules are unclear, LLMs become more attractive. If logic clarity is high, inconsistency is costly, and auditability matters, deterministic systems move to the front. If both sets of conditions matter, a hybrid design is probably the right answer.

This is not a mathematically perfect framework. It is something more useful: a better conversation.

A shift in mindset is part of the transition

Every meaningful technological shift produces a period of overuse, misuse, imitation, experimentation, correction, and eventual maturity.

That is not evidence the technology is empty. It is usually evidence that something real has arrived and that organizations are still learning how to absorb it properly.

AI is no exception.

So this conversation should not be framed as a scolding exercise. The goal is not to mock companies for experimenting, nor to suggest that caution means hesitation forever. The goal is to help organizations distinguish between different kinds of work, different kinds of decisions, and different kinds of systems — so they can adopt AI in ways that create lasting value rather than temporary theater.

Some workflows genuinely need language models. Some absolutely do not. Many need structure first. And many of the best systems will combine old and new tools more elegantly than the market currently admits.

AI is here to stay. The question is not whether to use it, but how to use it well.

And in a transition like this, it is far better to do it right than to do it fast.

The bottom line

Not every workflow deserves AI. Not every AI-shaped workflow deserves an LLM. And not every “manual” process is immature simply because it is not model-driven yet.

Sometimes the best solution is a spreadsheet. Sometimes it is a workflow engine. Sometimes it is a rules engine with beautifully boring logic. Sometimes it is an LLM inside carefully designed boundaries. And often, the most effective systems combine all of the above in the right places.

Good judgment in this space is not about choosing the most advanced tool.

It is about choosing the right one for the shape of the work.