Skip to main content

← /journal / ai-ml / llm-workflows

[post_010] · § AI / ML

How to Use LLMs to Automate Your Business Workflows

A practical guide to integrating large language models into real business processes — without the hype, just the results.

DK · Principal Engineering · · 8 min read · AI / ML

A practical guide to integrating large language models into real business processes — without the hype, just the results.

[01] §

Why LLMs finally ship

For a long stretch, "AI workflows" meant a demo on Tuesday and a support ticket on Friday. What changed in 2024 is the maturity of the surrounding infrastructure: structured output, function calling, caching, evals, and enough observability to treat an LLM like any other production dependency. The model is no longer the bottleneck — the scaffolding around it is.

[02] §

The RAG trap

Retrieval-Augmented Generation is the default reach for teams adopting LLMs, and most of them end up with a vector database and a new set of problems. RAG is right when the answer lives in a corpus too large to prompt with. It is wrong when the corpus is small, the schema is known, or the real ask is "do a thing," not "retrieve a thing." We reach for direct tool use first, and only add retrieval when measurement forces it.

[03] §

Evals are the product

We write the eval harness before we write the prompt. Without one, you cannot tell whether a change improved your system or merely shifted its failure modes. Bad evals are worse than no evals: they produce the feeling of progress. Good evals are boring, deterministic, and owned by someone who will lose sleep when they drop.

[04] §

Where it pays back

The projects with the best payback aren't the most impressive — they're the most repetitive. Contract review, lead triage, support-ticket routing, document classification. Anywhere a human is reading the same kind of content over and over and making a small structured decision, there is room for an LLM with evals to hit 90%+ accuracy and reclaim a meaningful fraction of someone's week.

[05] §

What to build first

Pick the narrowest task that is genuinely annoying, has clear correct-answer criteria, and would be useful even at 70% accuracy. Ship that, measure it, and iterate. Do not build a platform. Do not build a framework. Build the thing, then let the second and third use cases teach you what to generalize.

Working on something like this? Start a project →