Day 2 — From Auditor to Tech Lead: why your agent names drive the prompts
The initial plan called the second agent "Auditor". Within hours we changed the name to "Project Explorer" / Tech Lead. It's not cosmetic — it's the difference between a system that finds problems and one that designs solutions. This is the lesson of Day 2.
Names drive the prompts. An agent called "Auditor" will tend to produce problem lists. One called "Tech Lead" will tend to produce maps and recommendations. The system prompt inherits that tone by drag, even if you don't write it explicitly.
Day 1 left the A2A architecture with a working walking skeleton: Orchestrator + Investigator + Notifier, three independent services talking over A2A in a 2-step pipeline. Day 2 was going to add the second cognitive agent: the one that turns the Investigator's research into an executable plan.
The initial plan called it Auditor. Within hours it became Project Explorer (role: Tech Lead). It's not a cosmetic change — it was the change between building a system that finds problems and one that designs solutions. This is the lesson.
The original plan: the passive Auditor
In the initial design, the Auditor had this responsibility:
Receives the Investigator's report. Audits the current code. Returns findings focused on the delta between current state and target.
Sounds reasonable. And produces a working system. But there's something architecturally weird that took me a while to see: the Auditor's output is problems for others to solve. It's like a consultant who hands you a list of findings and walks away. The downstream Developer has to decide which to attack, in what order, where to put new code, which patterns to follow.
That turns the Developer into the agent making architectural decisions. But the Developer is the executor — its role is writing code that passes tests, not deciding architecture. Mixing both roles produces the same problems as in human teams: a junior who has to decide everything either gets stuck or delivers inconsistent things.
The reframe: from Auditor to Tech Lead
The mindset change was:
The second agent doesn't find problems — it designs the implementation. It doesn't describe what's wrong — it describes what to build, where, how, and what the acceptance criteria are.
With that framing, the agent changes:
AxisAuditor (passive role)Project Explorer / Tech Lead (active role) InputResearch + current codeResearch + project map + feature spec OutputAuditReport (problem list)DevelopmentBrief (executable plan)
DecisionsIdentifies issuesDecides architecture
For whom"Someone" downstreamFeature Developer (junior AI)
Verbs in the prompt"audit", "find", "evaluate""design", "decide", "instruct"
It's the Architect → Senior Engineer → Implementer pattern from the real world, translated to A2A. The Tech Lead concentrates the architectural intelligence; the Developer executes with discipline.
Why the name matters more than you'd think
LLMs are very sensitive to the declared identity in the system prompt. Compare these two system prompts (same input, same expected output):
Auditor version:
You are a code audit agent. Given a research report and a codebase,
identify issues and produce a list of findings ordered by severity.
Tech Lead version:
You are a senior engineer / tech lead. Given a research report and a
codebase, produce a precise, executable specification for a junior
developer that will implement the feature with TDD.
Same conceptual agent. Different emergent behaviors:
- The Auditor biases toward enumerating exhaustively (more findings = "more work").
- The Tech Lead biases toward deciding and prioritizing (fewer items = "more clarity for the dev").
- The Auditor describes what's wrong.
- The Tech Lead describes what to build.
- The Auditor orders by severity.
- The Tech Lead orders by technical dependency (what to do first).
Names are implicit prompt-engineering. When your agent_card says
"name": "auditor", every model in the world activates "auditor" associations — meticulous, exhaustive, descriptive, conservative. When it says"name": "tech-lead", it activates different associations — decisive, opinionated, action-oriented.
This doesn't get fixed by a good prompt alone. The prompt and the name reinforce each other. A Tech Lead prompt with name "auditor" produces mediocre output because of the model's internal identity conflict.
The new schema: DevelopmentBrief
With the role redefined, the output also changes. It's no longer a generic AuditReport — it's a DevelopmentBrief the Developer can execute without thinking too much.
interface DevelopmentBrief {
feature_summary: string;
files_to_create: Array<{ path: string; purpose: string }>;
files_to_modify: Array<{ path: string; change: string }>;
// The most important field of the brief:
follow_pattern: {
reference_file: string; // path to a file in the repo
why: string; // what to imitate from it
};
test_scenarios: Array<{
id: string; // "t1", "t2"...
name: string; // readable description
type: "unit" | "integration" | "e2e";
description: string;
}>;
constraints: {
max_files_changed: number;
max_lines_added: number;
test_framework: string;
forbidden_patterns: string[]; // e.g. ["any", "ts-ignore"]
};
acceptance_criteria: string[]; // booleans for "done"
out_of_scope: string[]; // what NOT to touch
}
Why
follow_pattern is the most important field
Look at the pattern:
{
"follow_pattern": {
"reference_file": "src/features/oauth/index.ts",
"why": "this file establishes the error handling, naming, and
session integration pattern the new module must imitate"
}
}
Instead of explaining to the Developer "use typed errors, kebab-case
naming for files but camelCase for utils, integrate with
getSession()...", the Tech Lead points at a real
file in the repo that already does something similar. The
Developer reads it, imitates it, saves tokens, and dramatically reduces
hallucinations.
Analogy > imagination. LLMs are extraordinarily good at imitating patterns they see, mediocre at inventing patterns from scratch. If your Tech Lead points at real repo code, your Developer works by analogy. If it just explains the pattern abstractly, it works by imagination. Massive quality gap.
Schema-as-policy in the brief
Some fields are designed as invariants, not suggestions:
test_scenarioscannot be empty. If the Tech Lead can't define what to test, the problem is misunderstood.constraints.forbidden_patternsgets enforced literally — the Developer returnsstate: failedif its output contains any.out_of_scopeprevents scope creep. The Developer cannot touch things not listed infiles_to_create/files_to_modifyeven if improvements come to mind.
The brief is the Tech Lead → Developer contract. If
the Developer can't execute without asking clarifications, the Tech Lead
failed. That's measurable: rate of state: input-required
from Developer to Tech Lead. Good Tech Lead = low rate.
The 3-step pipeline
With the Project Explorer in place, the Orchestrator's
/chat went from 2 steps to 3:
chat → investigate (Investigator)
→ build-brief (Project Explorer / Tech Lead)
→ notify (Notifier)
What flies between agents:
Investigator output → Project Explorer input
───────────────── ────────────────────
report_md (markdown) research_report_md
structured.findings research_findings (structured)
The Project Explorer receives the research whole — we don't decide for it which findings are relevant. We trust its filtering capacity. If that turns out to cause problems later (the LLM gets distracted by non-applicable findings), we'll introduce an intermediate step. For now, "trust but verify": give it the context, watch the output quality.
Skills as markdown: the pattern's second-use validation
Day 1 introduced the "skill-as-markdown" pattern: agents are generic, skills define the mission. Day 2 validates it on its second use.
The Investigator today loads two distinct skills:
skills/feature-research.md— for new feature research (the flow we're building)skills/upgrade-research.md— for version upgrade research (Day 1)
The agent code is the same. Only the file passed in the task payload changes. That lets you reuse the same Investigator for completely different flows without touching code.
// In the Orchestrator, the flow picks the skill:
const skill = intent === "feature"
? "skills/feature-research.md"
: "skills/upgrade-research.md";
await investigator.sendTask({
skillId: "research",
payload: { skill_uri: skill, topic: text },
});
When a pattern pays its first additional use case, it stops being a hypothesis and becomes infrastructure. Day 2 validates that separating mission from agent wasn't speculative design — it was useful architecture.
Project Explorer in code
// agents/project-explorer/src/server.ts
const TaskRequest = z.object({
task_id: z.string(),
skill_id: z.string(),
payload: z.record(z.unknown()),
});
const BriefingPayload = z.object({
skill_uri: z.string(),
feature: z.string().min(1),
research_report_md: z.string().optional(),
research_findings: z.array(z.unknown()).optional(),
project_context: z.object({
name: z.string().optional(),
stack: z.array(z.string()).optional(),
conventions: z.string().optional(),
}).optional(),
});
app.post("/tasks/send", async (c) => {
const { task_id, skill_id, payload } = TaskRequest.parse(await c.req.json());
if (skill_id !== "build-brief") {
return c.json({ task_id, state: "failed",
error: `unknown skill: ${skill_id}` });
}
const briefing = BriefingPayload.parse(payload);
const skillContent = readFileSync(
resolve(SKILLS_DIR, briefing.skill_uri.replace(/^skills\//, "")),
"utf-8",
);
const userPrompt = [
`FEATURE: ${briefing.feature}`,
briefing.project_context?.name && `PROJECT: ${briefing.project_context.name}`,
briefing.research_report_md && `\n--- RESEARCH ---\n${briefing.research_report_md}`,
].filter(Boolean).join("\n");
const response = await claude.messages.create({
model: MODEL,
max_tokens: 3000,
system: skillContent, // ← the Tech Lead's skill
messages: [{ role: "user", content: userPrompt }],
});
// Extract markdown + structured JSON
const briefMd = extractText(response);
const structured = extractJsonBlock(briefMd);
return c.json({
task_id,
state: "completed",
result: { brief_id: `db-${task_id}`, brief_md: briefMd, structured, ... },
});
});
Note:
- No domain logic in the agent. All architectural
intelligence lives in
skills/feature-briefing.md. The agent code is pure A2A plumbing. - Same A2A schema as the Investigator.
task_id+skill_id+payload. Plug & play.
Day 2 lesson
Designing A2A agents is as much an architecture exercise as a naming exercise. The agent's name and its skill markdown define its behavior more than the TypeScript code surrounding them.
If your qualitative output isn't what you expected, before touching the prompt, consider touching the name. Changing "auditor" to "tech-lead" reorders the entire model's behavior without touching a line of system prompt.
And a corollary:
Every A2A agent should have an active role. If you find an agent "describing" instead of "deciding", "listing problems" instead of "proposing solutions", "auditing" instead of "designing" — it probably has too much descriptive responsibility and not enough agency. Reframe.
What we didn't do in Day 2 (on purpose)
- Project Mapper: the Tech Lead still has no real
repo context. It invents plausible paths
(
src/features/oauth/) without knowing if they exist. We fix it in Day 3 with MCP. - Feature Developer: the role that closes the loop. Leaving it for Day 4.
- Real Tech Lead tests: qualitative output is only validated with real LLM. Mock + local validate structure, not quality.
Next step: real project context
The Project Explorer today decides architecture based on the LLM's intuition about what a typical TypeScript project looks like. That produces plausible but hallucinated briefs — invented paths, generic conventions, no real references to the repo.
Day 3 introduces two new components to fix this:
- Codebase MCP server that exposes the repo via Model Context Protocol
- Project Mapper agent that consumes it and produces a structured map to feed the Investigator and the Project Explorer
That's where A2A and MCP start dancing together.
Day 2 closes with: 4 A2A services in 3-step pipeline, clear separation between external research and internal architecture, and the lesson that names are implicit prompts — designing them matters as much as designing schemas.
Continues: Day 3 — A2A + MCP together, how to expose a repo as Model Context Protocol server.
Comments
Loading comments…
Sign in on your dashboard to join the conversation.