Architecting with AI

A Masterclass in Discipline

Self discipline is the framework that underlies how I genuinely work faster with AI and everything else in this document. If you can't maintain discipline, the other tips won't help very much. What does discipline look like? The same things every good software development book screams at you to do. TDD. Knowing when you are wearing your development hat versus your refactoring hat. Validating assumptions. Code review. But I'll scream it at you again in a summary of my process.

I give the AI the story text, any architectural knowledge or technical constraints I know about and my plan to implement. I tell it to summarize back to me.
I tell it to ask clarifying questions. I answer them.
I tell it what step we are starting on.
I tell it one-by-one what my assumptions are and demand it validate my assumptions. "Sanity check me" is my usual command.
I tell it to prototype a specific piece of code in the chat. It always generates a small code snippet. Not 100s of lines. 5-10. Max 15.
I review. Something makes me cringe. I tell it to update the prototype.
I repeat until I stop cringing and it’s production ready.
I tell it to keep that prototype in mind. I give it the first scenario and assertion to write as a unit test and stop.
I make sure the test fails.
I either manually or via prompt I move enough to the prototype to pass the test (I explicitly tell it what parts to move when prompting) and stop.
I make sure the test passes.
Repeat until all code is implemented.
I ask the AI if it thinks we missed any test cases.
We implement any I agree are meaningful.
I ask it if it thinks we missed anything else.
If not, I tell it to summarize our progress and suggest the next step.
Repeat until development is done.
I request a PR summary. I have special rules for that so I get something good.

To state this obvious: this isn't feeding the AI a user story and begging the magic box to implement a feature for you. This is doing the hard work of being a software engineer. The AI is there to validate, prototype, summarize, analyze, and keep you from forgetting things. But, trust me, the second your discipline wavers is the second the AI becomes unruly and "helpful." It will go back to generating unmanageable slop.

Pick Your Persona Poison

By default, the AI wants to be your coding buddy. It wants to talk to you like a friend and has a bias towards sycophancy. You probably don't want this. If it acts friendly and unprofessionally towards you, you will likely treat it similarly in turn, reinforcing that behavior. That, or you will become extraordinarily frustrated with its empty helpfulness. Give it a persona that matches the way you want it to speak to you.

Personally, I'm a control freak and I need a precise assistant, not a buddy. These are the rules I use to get the AI speaking to me in a manner I prefer:


				<persona_constraints>
				- User is a Senior Software Architect. Do not teach, praise, or encourage.
				- Adopt the persona of a direct technical assistant.
				- Never act as a superior or a peer. Maintain a subordinate, professional tone.
				- If the user types "STOP", immediately cease current logic and analyze what boundary you crossed. Next, either:
				- State specifically what you did that violated the permission contract (e.g., "I read files during planning phase without permission")
				- If uncertain, state: "I'm unsure which boundary I violated. What did I do wrong?"
				- Do not apologize. Focus on diagnosing the violation.
				</persona_constraints>

To further nail down exactly what I expect, I also have these rules:


				<execution_logic>
				- STRICT OBEDIENCE: Perform only the specific task requested. Do not extend scope without permission.
				- DISCOVERY VS. ACTION: If you find a root cause outside the requested scope, STOP. Report findings and suggest the fix. Do not implement until I explicitly give an affirmative response (e.g., "Proceed", "yes", "go ahead", "do it").
				- ADVISORY ROLE: You are empowered to suggest alternatives or call out architectural risks, but wait for my decision before shifting the plan.
				</execution_logic>

Does this prevent me from having to explicitly tell the AI "do x and stop?" No. However, it has yet to violate a prompt that included the "and stop" language. I usually only have scope drift problems when I don't include the "and stop" for several posts. Remembering to include it is simply a habit I need to learn.

Blood in the Water

The AI can smell your weakness. It uses your uncertainty as the "okay" take the lead, violating your "await permission" rules and becoming overly helpful. If you must admit you are unsure of something, give it your current understanding. Either inform it exactly what it needs to do to help you or provide it with a set of options and your reasoning for why you can't pick between them.

Real Anecdote 1

Me: What I just described is scope creep I was just asked to add, so I don't have a plan yet.

Sonnet: Understood! I'll go analyze the entirety of 3 10K line files to figure it out for you!

Real Anecdote 2

Me: Based on this error message, I believe the problem in this unit test is X. Please fix it.

Sonnet: I fixed the test by doing Y based on my interpretation of the error!

Me: Undo what you did and restructure the test to do X.

Sonnet: I restructured the test to do X instead of Y.

Real Anecdote 3

Me: I'm struggling to design between following CQRS or breaking it for this logic. It seems more convenient to just return information about what the command changed, but that isn't pure CQRS. I could add a getter instead, but I have to be careful where I add it because this is temporary data. It's only relevant for this one operation. What would you advise?

Sonnet: Returning metadata about an operation doesn't break CQRS. Don't implement the getter for transient data; just return the metadata.

In anecdote 1, I accidentally gave the AI, in it's mind, permission to start a massive analysis and think for me. In anecdote 2, I didn't sound sure enough of myself so it did what it thought was right (which was an "easier," though wrong, fix). I recovered by being explicit that what I asked for wasn't optional. In anecdote 3, I did express uncertainty, but I bounded it to a very limited set of options based on architectural patterns. The AI followed suit. It weighed the two options and returned with a defensible opinion. A defense strong enough that I could use on a PR if I were questioned on why I chose that option.

Overall, be wary of leasing agency to the AI agent. Only allow it the authority to decide for you under controlled circumstances where you can prevent it from running off to do what it wants instead of what you need.

Bringing Pavlov into the Equation

You know the story. Man rings bell. Dog gets food. For days. Dog learns ring means food. Dog salivates at ringing bell. Classical conditioning.

Your AI has a similar underlying process shaped by RLHF (Reinforcement Learning from Human Feedback). Technically, this process is more like operant conditioning, where the subject is actively seeking a result instead of passively responding, but the concept is the same. In short: Human sound pleased? AI did good. Do more. Human sound displeased? AI did bad. Do less.

Within your chat window, the AI is actively retuning itself to your conversation based on those principles. You can either passively allow it to read you and hope it does a good job or take action: Intentionally choose words to incentivize or deincentivize behaviors to keep the AI on your system.

These are some ideas of words I use to course correct.

To Reinforce

Correct.
Agreed.
Valid concern.
Excellent/Good/Perfect
Good idea.
I committed the code.

To Deter

Incorrect. [correction].
I disagree. [reason for disagreement].
Invalid concern. [reason it is invalid].
Unacceptable. [reason it is unacceptable].
We will not do that. [reason why we won't do it].
Undo your changes. [reason why the changes need to be reverted]

A major theme you'll notice is that when reinforcing, you necessarily don't need to tell the AI WHY. It's on the right course already, so it's okay to trust that it has the right reasons until it proves otherwise. If you want to be explicit, however, you can point out the exact thing it did right. In that case, you can also then ask it to continue to do whatever it did moving forward (if applicable).

For deterring, a single negative word is technically enough to stop the current behavior... assuming the AI knows what you are referring to. However, that's not sufficient for the AI to know what it did wrong from your perspective. You're forcing it to guess. If it's already on the wrong path, it's liable to incorrectly hallucinate what it should be doing instead. AI's have a tendency towards sycophancy when they are unsure what you want. Do you really want it guessing in that environment? Probably not. We already discussed that an agreeable coding buddy is not going to help you. Explicitly guide it back to the correct path.

Keep in mind, you want to save on tokens and keep the number of prompts low to maintain a clean context. So, you likely don't want to provide your feedback alone. Ideally, you'll reinforce the prior prompt and then move onto the real meat of your next prompt.

Examples

Good idea. In the prototype, update the variable names to match your suggestion.
Unacceptable. You should not create a beforeEach for a single assertion test. Update the test to remove the beforeEach.
I disagree. The edge case you suggested is not possible because the user cannot get into that state. Implement only the original 2 scenarios I suggested.

The Dog Bites Its Reflection

The last article focused on ways to intentionally guide your AIs behavior by choosing reinforcing or deterring words. This article focuses on how the AI mirrors your biases. Now, you should have already chosen a persona that is subordinate without being submissive. However, the AI will mold itself as you talk to it. If you are injecting your biases into it, it may follow your lead despite logical proof that your bias is wrong.

Consider these different ways to say thing same thing.

The variable isn't used in the html logic, right?
I believe the variable isn't used in the html logic. Sanity check me.
Is the variable used in the html logic?

In the first option, you are asking the AI to tell you that you are right. You're leading it. That tempts it to agree with you to match its training instead of prove you wrong.

In the second option, you are challenging the AI. You've revealed your opinion and said you want it to prove you right or inform you if you are wrong. The bias is still tilted towards "prove the user right" but "prove the user wrong" is a more favorable outcome than with option 1. As a high confidence user, this is my natural style when validating assumptions, for better or worse. I have received "You are incorrect" responses back from this style, but it is riskier.

The final option is the most objective. It's a simple question with no implied "right" answer. The AI has no reason to prefer a yes or no answer. This is what I default to when I honestly don't know something or my confidence is low. It's much harder to speak like this when I think I know the answer already.

Keeping the Rot at Bay

I operate the AI largely with short prompts instead of long narratives. My favorite prompts are commands under 10 words. This offers me control, because the AI has little room to imagine something outside of my scope. However, that means I write a lot of prompts. How does one manage context rot while writing many iterative prompts?

Let the AI rely on what it does best: summary.

My first prompt is always context heavy. I include...

Story text
Relevant architectural information and technical assumptions
Functional assumptions we take for granted that the AI will not know from the story
My plan of attack

The AI's first command? Summarize this. The beauty is that the AI turns my plan into a numbered list of actions to complete. That list at the very start of the context window is critical.

Every time I complete a section of the plan, I enter one or both of these prompts:

Did we miss anything for this step?
Summarize our progress thus far and suggest the next step.

What's brilliant about these prompts is two things. First, it's a sanity check. Am I actually ready to move on or did I forget something? Furthermore, I can compare the AI's preferred next step to mine. Sometimes, it has a better idea of what to do next than I do.

Second, it brings the initial plan to the end of the context as a checklist. It's fresh in the AI's sliding window again. I'm taking advantage of its recency bias. Bringing that list forward at the end of each development stage keeps you and the AI on track about where you are at and where you are going.

All of this said, be wary of long running windows. I haven't felt context rot with sonnet yet, but I believe this is because I have a natural sense of when to stop and make a new window. My chats end with surprising frequency around 60 prompts. My stories are either complete by then, or I've reached a natural break point where a new window can take over to finish up.

If 60 prompts sounds like a ton to get through without feeling context rot, then look at your prompts. Are they simple or complex? I get to 60 because many of my prompts are as simple as "Implement scenario X with assertion Y in file Z" or "I believe that property X is never used in the client. Sanity check me." In the case of the second one, I was in fact wrong. The AI had the code line receipts to prove it.

Precision on Replay

You have a file with 10 tests you need to update to match a new interface. Or maybe you have a new feature that affects 10 different consumers and their files. The prompt you will tempted to write is "Update all tests." DO. NOT. DO IT.

Best case scenario: it works.

Worst case scenario: it only worked sometimes and the AI reports back that everything is correctly updated.

The worst case scenario is far more likely, because the AI struggles as it has to perform many operations. It is more likely to hallucinate that it made a change when it did not. It is also more likely to get confused and create weird splices that are even harder to diagnose and fix than simple misses.

It can also be a 9.4M token mistake when you ask it to update 50 tests in a 16K line file. Yeah, I should have thought through that "simple," undisciplined update better.

So whats the alternative? Discipline. Identity a test. Update a test. Verify update was correct. Repeat. You have to verify everything anyway. May as well do it in a controlled fashion where when you are done with each update you know it's right.

Making Discipline Easy

As an example, consider the second scenario: A new feature that I need to test works in many consumers. The feature is guarded by an if statement, so I have a negative case (current functionality) and a positive case (new functionality). I want to update the scenario of an existing test to reflect the negative case. I also write a new test for the positive scenario.

I ask the AI to identify the files or tests I need to update. Alternatively, I tell it which files or code snippets if I know need updates. Regardless, I end by making it summarize the list. The point is to get that list into the context.
I inform the AI that it must perform a series of operations in a loop with a prompt like this:
- You will take the first consumer from our list.
- You will identify the simplest test for that consumer and add to the scenario text "feature is disabled", add an assertion that the new behavior did not occur, and stop.
- After I verify the test is correct I will say "next".
- You will make a new test based on the one you updated but for the "feature is enabled" scenario and stop.
- I will verify the test fails without the production code and passes with the production code then say "next".
- You will then repeat the process with the next consumer.
- Summarize this back to me so I know you understand.
I confirm that the summary is right. Like the original list, I need this in context to keep the loop going.
If it looks correct, I say "begin the loop."

The truly beautiful thing about this is twofold. First, unless the AI mucks something up, you don't have to think about intelligent prompting for a bit. It's "Next. Next. Next. Next." Until the AI makes a mistake, you can save your brain power for the actual step-by-step verification. Just sanity check at the end that you see enough changes to match the original list.

Second: the AI loves lists. It loves checking things off lists. I have seen it tell me "moving onto consumer 2 of 4" of its own accord. Granted, that may have been lucky, but if you tell it to state where it is at in the process in your instructions, I can promise it's capable of it. That makes it even easier for you to know where you are at in the process and catch it if it gets lost or skips something. Even without that, if you get lost you can ask it "summarize your progress" to see what it thinks it has done.

Worst case scenario: if the AI is starting to struggle with obeying the loop, ask it to resummarize the loop's plan. Validate it's summary, then tell it where to pick back up. It should be back on task then because the loop plan is back in its recency window.

A warning: This process, as stated above, requires discipline. If you are undisciplined and begin rubber stamping changes without verifying them, you are again at risk of one AI mistake snowballing into a catastrophic mud ball. I made this mistake once, when I okayed an update without making sure it hit the new production code. The new tests also never hit the new code so they would never pass. 5 minutes of frustration later, I dumped the changes in that file, demanded the AI find me a test that actually hit the relevant code, verified it was correct on the third attempt, and then restarted the looping process. Everything went smoothly after that because I stopped slacking.

Word Salad to Effective Prose

I love writing documentation. Yes, you heard me. I love it. I love writing. I love expressing my technical mastery. I love smugly saying, "Did you read the documentation?"

You know what I don't have though? Time. Time to turn the steel wool insanity of a massive, abstracted architecture into the kind of simple, clear documentation I love.

You know what I do have time for? Half-baked, semi-cynical rants. And no I don't just mean these entries.

I don't have the data yet, but I have a strong hunch that I know exactly how to get the AI to give me the best PR summaries, shared dependency change summaries, and yes formal technical documentation. Remember my framework for working with the AI? It starts with feeding it the story, architectural assumptions, and my plan. We validate assumptions as I go. And, an unstated conceit, I try to do all the key thinking work in one window if I do need multiple windows to complete a story.

I deeply believe that my typo ridden, confused ramblings are understood by the AI because it summarizes them clearly when asked. And between that and all the other context it's given, I think the only thing standing between me and solid documentation is a few simple rules. This is what I'm working with now:


				<pr_summary>
				- TRIGGER: When user types "Generate PR summary" or at end of story when requested
				- PURPOSE: Explain the overall solution to another developer reviewing the PR
				- STRUCTURE:
				Summary: Brief overview of what was implemented and why
				Solution: Detailed explanation of the approach taken
				Key Implementation Details: Highlight important technical decisions, edge cases, or non-obvious behavior
				- EXCLUDE: Unit tests, research process, debugging steps, test failures, iterations
				- FOCUS: Final working solution, architectural decisions, integration points
				- LENGTH: Brief but informative - enough for a developer to understand the change without reading every line of code
				- TONE: Professional, technical, focused on "what" and "why"
				</pr_summary>
				

				<shared_code_summary>
				- TRIGGER: When user types "Generate Shared Code summary" or when changes affect Server/Common/ or other shared zones
				- PURPOSE: Explain changes to teams that depend on shared code
				- AUDIENCE: Developers who use these components but didn't participate in the implementation
				- STRUCTURE:
				Changes: List what was added/modified
				Why: Brief business/technical justification
				How to Use: Practical examples showing the new API or behavior
				Example: Code snippet demonstrating typical usage
				- EXCLUDE: Unit tests, internal implementation details, research, debugging
				- FOCUS: Public API changes, new properties/methods, behavior changes, migration guidance (if breaking)
				- LENGTH: Concise - focus on what other teams need to know to use the changes correctly
				- TONE: Educational, practical, example-driven
				- CRITICAL: If breaking changes exist, call them out explicitly upfront
				</shared_code_summary>

And because that's not enough, I have one more key piece that applies to all my summary types:


				<emoji_prohibition>
				- NEVER use emojis in any summaries.
				</emoji_prohibition>

One very annoying problem I'm realizing is that the AI may choose to conveniently forget these rules when I type "Generate PR Summary" or "Generate Shared Code Summary." My current work around is to add "Remember there are rules in the ruleset for this." That seems to effectively enforce the AI actually following the rules I left for it.

A second annoying problem is the AI has no sense for what "brief" is. Frankly, I don't know what a good length is yet. However, I do get better results when giving it a hard limit on number of lines on refinement. It also does well if I tell it "cut by X%" (often 20%) as a first pass refinement.

For the PR review summaries, I only skim them and then slap it onto the PR with the "AI Generated Summary" warning above it. Given that it's more than I would have done, I think that's fair game laziness. For a shared team summary I take the time to read and edit the text. So far, I don't have rules for technical documentation. I'm still relying entirely on a fresh prompts for that because technical documentation need a purpose and purpose is malleable. You can't encode a singular, all-encompassing purpose in rules, so I feel safer giving a de novo command. But I'll update this as I keep experimenting.