Skip to content
Writing
·6 min read

When owning your AI stack beats paying per token

Drafted through my n8n + AI pipeline, edited by me.

By the end of this you'll know when running AI on your own hardware actually pays off, and when the cloud is the cheaper, smarter call.

The mess

A team hears 'AI' and lurches one of two ways. They send everything to a cloud API without thinking about the data they are shipping out, or they insist on self-hosting everything as a point of pride. Both end badly: a surprise per-token bill that grows with your success, or a home-built stack that works until the one person who understands it is on holiday.

The wrong way people solve it

They treat it as a religion. 'Cloud is lazy' and 'self-hosting is a waste of time' are both wrong, because neither is a rule. It is a per-task decision. The people who get it right decide it on a spreadsheet, weighing volume, sensitivity, and how often the work changes, not on which side sounds more impressive.

The system view

For each AI task, run it through a short decision before you commit. A task needs a model. Is the data sensitive? Is the volume high and steady? Does it need to stay identical over time? The answer routes the work to cloud or local. Then you spot-check quality, watch for cost or quality drift, and record which model did what.

Trigger (a task needs a model) → Decision (sensitive? high-volume? must never change?) → Action (route to cloud or local) → Human review (spot-check quality) → Alert (cost or quality drift) → Record (which model did what).

What I would build

Cloud as the default for heavy thinking and anything low-volume or fast-changing, because you get the strongest model and maintain nothing. A small local model for the three cases that earn it: privacy, where the data cannot leave your walls; cost at volume, where the same task runs constantly and a fixed machine beats a per-token bill; and control, where the model must stay exactly the same. I run this split myself, including a local pipeline that turns a photo and a voice clip into a talking-head video on one desktop GPU, with no per-minute fee.

What can break

Self-hosting at low volume, where you spend more on maintenance than you ever save. Choosing local for work that genuinely needs the strongest model. A local box with no monitoring that silently degrades until someone notices the output got worse. And privacy assumptions that do not hold, because the data still passes through a logging layer you forgot about. Match the tool to the job, or it bites you later.

What the business gets

A stack you can afford and defend: predictable cost where volume is high, privacy where it is required, and the latest capability where it actually matters. No surprise bill at the end of a good month, and no hobby tax for the sake of a point.

Self-hosting AI is not a flex. It is a cost-and-privacy decision you should be able to defend on a spreadsheet.

Bring me the workflow you want AI inside. I'll tell you what I'd run in the cloud and what I'd keep on your own hardware.

Building something this should run inside?

Book a systems call