We built an AI blog writer with Claude — here's what we learned about prompt engineering
The secret isn't the prompt. It's skills, typed JSON, and the assumption that the AI will mess up every four runs.
The first prompt we sent Claude to generate a post was a single sentence: "write a post about WebMCP, professional tone, 1500 words." What came back was a high-school essay on digital transformation with five emojis and two "synergies." We deleted everything and started over.
What we learned over the next few weeks wasn't about prompts. It was about architecture. These are the four patterns that turned the writer from "okay" into "publishable as-is."
1. Skills are markdown files — not strings inside the code
The first instinct was to put the prompt in a TypeScript constant. It worked for two days. The moment we wanted to change tone for a specific author, we had to open the code, edit the constant, commit, redeploy. Ridiculous for what's plain text.
The fix is trivial but powerful: every "skill" is a markdown file.
skills/
├── blog-writer.md
├── linkedin-writer.md
├── x-writer.md
└── brand-reviewer.md
The runtime loads them as text, injects them into the system prompt, and each describes in plain language how the model should behave: tone, structure, length, what to do, what to avoid.
Instructions are data, not code. Changing the writer's behaviour for a given blog doesn't need a deploy — it needs a markdown edit.
2. JSON Schema on the output, always
Asking an LLM "return an object with title, subtitle, body" with nothing else guarantees that 5% of generations come back malformed. Markdown where you expected HTML, smart quotes in a key, a code block that bled into the title.
Claude's API has a specific tool for this: tool use with input_schema. You define a JSON Schema and the model commits to respecting it.
const tools = [{
name: "publish_post",
description: "Generate a complete blog post ready to publish.",
input_schema: {
type: "object",
properties: {
title: { type: "string", maxLength: 70 },
subtitle: { type: "string", maxLength: 150 },
meta_description: { type: "string", maxLength: 160 },
slug: { type: "string", pattern: "^[a-z0-9-]+$" },
body_html: { type: "string" },
tags: { type: "array", items: { type: "string" }, maxItems: 5 },
},
required: ["title", "subtitle", "body_html", "slug", "tags"],
},
}];
With this, malformed outputs went from 5% to literally zero in weeks of use. maxLength on the title isn't just an SEO constraint — it's a hint to the model that the output is going to be parsed and shouldn't sprawl "just because."
3. Separate tone from form
The subtlest mistake at the start: putting tone ("speak like a senior engineer") and form ("structure with H2s and bullets") in the same prompt. The model prioritizes one and neglects the other unpredictably.
The pattern that works is splitting them into two distinct system-prompt layers:
- BRAND layer: tone, banned vocabulary, author voice. Loaded from the blog's
BRAND.md. - SKILL layer: output shape, length, HTML structure. Loaded from
skills/blog-writer.md.
The user message contains only the topic. The two systems stay stable, the topic rotates each generation. With this split, a generation that comes out "well-written but unhinged" gets debugged in BRAND. A "reasonable but badly structured" one gets debugged in SKILL. No guessing which sentence in the prompt to move.
4. Defensive parsing by default
Even with tool use enforced, one in a hundred generations comes back weird — the model decides to reply in plain text because it misread the conversation, or the endpoint returns a transient 529. If your code assumes the response always has the expected shape, you'll crash in prod at the worst possible moment.
Every access to the response goes through nullish coalescing:
const post = response?.content?.[0]?.input;
const title = post?.title ?? "Untitled draft";
const body = post?.body_html ?? "<p>Generation failed. Try again.</p>";
It's not paranoia — it's a realistic deal with a probabilistic system. Code for the happy 99% but catch the unhappy 1% without crashing.
The result, in numbers
With these four patterns, the writer produces posts in under 30 seconds, with valid HTML, ready-to-embed schema.org, correct slug, and a meta description hugging Google's exact limit. The "publishable as-is" rate went from an initial 30% to a current 85%. The remaining 15% is the author's cosmetic touch — because voice, in the end, is still theirs.
The secret, again, isn't the prompt. It's the architecture around the prompt: modular skills, strict JSON Schema, separated layers, defensive parsing. What we call "prompt engineering" is 20% prompt and 80% system.
The skills we use live in github.com/agentikas/agentikas-skills, permissive license. The runtime that loads them is @agentikas/ai, part of the monorepo. Fork, copy, improve.
Comments
Loading comments…
Sign in on your dashboard to join the conversation.