This Just Fixed 90% Of AI Coding

AAI LABS
컴퓨터/소프트웨어경영/리더십AI/미래기술

Transcript

00:00:00What actually happens when you force a coding agent to follow the rules?
00:00:03We all have a common struggle when working with Claude and other coding agents.
00:00:07They tend to not follow the instructions and ignore the Claude.md file completely.
00:00:11And even when we tried forcing TDD, it just tried to modify the test files itself.
00:00:15So that's when we came across this plugin that was getting popular, gaining 58,000 stars in just 24 hours.
00:00:21But that just shows what the hype cycle around AI tools is like.
00:00:25This plugin promises strict enforcement of software development methodologies into the workflow.
00:00:30But the question was whether it actually delivers on that.
00:00:33Our team has already seen similar workflows come out and most of them turned out to be just hype.
00:00:37So we put this plugin into actual workflows to see if it's worth implementing into real-world projects or if it's just hype.
00:00:43Superpowers is a plugin that enforces the traditional software development methodology right into the AI IDE that you're using.
00:00:50Now some people might think that existing agile frameworks like BMAD and OpenSpec already do the same thing.
00:00:56But this one is different because it's not just some agent system used to write specs of the project.
00:01:01It's an enforcement of the same agile methodology right into the workflow with strict gates that make sure the agent doesn't proceed until the current step passes.
00:01:10These gates are explicit checkpoints to prevent Claude from steering away from what is instructed.
00:01:15The core philosophy behind this plugin's approach is TDD and a systematic process over guessing.
00:01:20It verifies before claiming the project is successful, having instructions tailored to target the common areas where AI usually fails and fix them.
00:01:28It doesn't proceed to the next step until it gets the green signal from the user.
00:01:32So in short, it uses all the best practices we've been talking about in our previous videos, built in natively, removing the need to set up manually like we used to.
00:01:40So the plugin emphasizes true red-green test-driven development and other common programming principles like DRY and YAGNI that we were taught when we were learning software development.
00:01:50It's available for all the AI platforms.
00:01:52But since our team was using Claude Code, we just copied the register marketplace command first and added that to the Claude Code project we were using, after which we installed the plugin from the marketplace.
00:02:02And once installed, and with the restart of Claude Code, this superpowers plugin was available for use in the project.
00:02:08Now once we restarted Claude Code, we gave it a prompt that we wanted to build a project management software similar to Trello.
00:02:15It activated the brainstorming skill on its own, and instead of guessing the things that were needed to be built, it used the skill functionalities and first identified what was in the project.
00:02:24It asked a lot of questions in order to clarify the app, who the project would be for, and the tech stack we wanted to use, and considered the issues each choice might have.
00:02:33Like with the selection of the database, it gave suggestions that our choice might not be the right one and could be a problem for security because the one we selected runs in the browser and cannot be accessed from the server side, which we changed.
00:02:44It kept clarifying all of the details until we were satisfied with the options.
00:02:48Once it confirmed everything with us, the next step was to give three approaches from which we had to choose one that was to be implemented.
00:02:55So we chose the option we liked and suggested changes along with the selection.
00:02:58Now once it was done, it gave us the architecture design as well.
00:03:02Next it gave us the UX design, in which it mentioned how the boards were to be handled.
00:03:06It confirmed the whole project structure with us as well.
00:03:09And once all the design was approved, it documented all the data in a docs folder.
00:03:13And this is when this plugin beats others because it had inbuilt git instructions to commit each of the changes which other frameworks didn't do and we had to manually enforce.
00:03:22So once the brainstorming skill had created the plans, the writing plan skill was invoked and wrote the implementation plan and committed it.
00:03:29The plan broke down the large application into subtasks that were easier to implement.
00:03:33Now you might think that Claude's built-in plan mode already does all this on its own.
00:03:37But the main difference between this planning and Claude's planning is that Claude code planning is just for guidance to the agent on what it needs to do.
00:03:44It asked for only those questions regarding the tech stack for which it actually thinks there is a need of and makes smaller decisions like UI libraries by itself.
00:03:52On the other hand, superpowers is enforcement, meaning that you cannot proceed to the next step until the current step is passed, ensuring that the plan is actually implemented.
00:04:01Now once the planning phase was over, it asked us how we would like to implement the plan, giving us two options from which we selected the sub-agent driven implementation.
00:04:09Now Claude also spawned sub-agents on its own, but the skills in sub-agent driven implementation were different because it automatically set up a git work tree for each sub-agent so that their work did not affect each other.
00:04:20The agent needs to be isolated by work trees in order to make them work better because if they work in the same directory, they overwrite each other's work.
00:04:28And this is the main thing it handles on its own natively.
00:04:31Now once the planning was finalized, Claude moved to the implementation phase.
00:04:34It started off a task, and once the task was done, it spun up a separate review sub-task to verify the implementation against the specs.
00:04:41And once it had committed to git, it used another superpower skill, which is the code reviewer.
00:04:46And only when the code quality was approved by the previous agents did it start off with the next task and keep iterating on the previous ones until the quality was met.
00:04:54Now once each task was completed, reviewed, and committed on git, ensuring the next task does not start until the previous is completed, it asked us whether it should merge into main or create a PR.
00:05:04So we asked it to merge it back into main quickly.
00:05:06It then removed all the work trees and committed the entire project into the main branch itself.
00:05:11Now this process consumes a lot of the context window because of using sub-agents and multiple skills as for us, just one iteration used almost 50% of the context window, meaning that we have to be careful when working with this process.
00:05:24The project that it created was simple and had basic functionality.
00:05:27We wanted these lists to be ordered by the current states, which were to do, in progress, and done.
00:05:32And even though these individual cards were there, we wanted these lists to be movable as well.
00:05:36So we went back to Claude code and asked it to handle that, but it first started the way Claude normally does without the plugin skills.
00:05:42This must be because too much of the context was consumed, and we had to remind it to use the superpowers plugin.
00:05:48And after reminding it, it set off doing the tasks the same way it did previously.
00:05:52Now once we had gone through all the steps, Claude spawned agents to work on separate work trees, but this is where these agents get better because they are using the test-driven development approach natively.
00:06:02These agents first write tests for each part that was to be implemented.
00:06:05And once the tests had been written, it ensured that the agent wrote the code for it without any modifications to the test cases and ensured that the tests passed.
00:06:13The plugin skills used strong prompt queues that prevented it from modifying the tests themselves, basically invalidating all excuses Claude tends to make while trying to skip steps.
00:06:23These queues look like explicit instructions like "if there is a 1% chance of using a skill, use it".
00:06:29This ensured that each task was done in a proper structured way.
00:06:32One thing to note is that these agents were sequentially performing one task, so it took longer than the usual to complete one task as compared to the way Claude does it natively.
00:06:41But since it enforced strict guidelines, it ensured that the application worked as intended.
00:06:45As we mentioned earlier, the context is consumed at a fast rate with this plugin where only a few tasks lead to only 5% context left.
00:06:53So before proceeding with any further tasks, we ran the compact command so that we don't lose any of the context when Claude is brainstorming for the next task.
00:07:01Once the conversation had been compacted, we gave it a prompt on the next feature we wanted to implement and it set off the same way.
00:07:07But the best part about this session was that it did not do things by itself by guessing implementations, it kept asking questions from multiple angles, making sure the app was built the way we wanted.
00:07:17This plugin pushed Claude from all angles, clarifying edge cases in the brainstorming session like how the columns would look when they were empty, which is something so small that Claude on its own might have just guessed and implemented.
00:07:29Now the guidelines for using this plugin better is available in AI Labs Pro.
00:07:33For those who don't know, it's our recently launched community where you get ready to use templates that you can plug directly into your projects for this video and all previous ones.
00:07:42If you've found value in what we do and want to support the channel, this is the best way to do it. The link's in the description.
00:07:48Another one of the strengths of this is its ability to use systematic debugging.
00:07:52We encountered a bug with the saving of data on refreshes and we gave Claude a vague prompt, not specifying where we found the bug, and asked it to use systematic debugging to fix that.
00:08:01It loaded the systematic debugging skill to do the task. The skill's work was divided into four phases.
00:08:06The first phase identified the root cause by asking us questions about that.
00:08:10From our answer, it tried to investigate and traced from the direction we gave and found the right file that might have some issue.
00:08:16And once the root cause was identified, phase two targeted toward isolating the bug, and while phase three narrowed down the actual reason the bug occurred so that it could be fixed.
00:08:25And phase four was applying the fix. It used the whole process and made sure the debugging was more structured than probing around the codebase and looking for what went wrong and it ended the fix with testing.
00:08:35Now there are a lot of tasks that do not require the full plugin workflow where using it would be much more than needed, like when we had to change the UI of the app we were working on, but we didn't want to wait 15 minutes for just a UI change.
00:08:47So for these kinds of tasks that do not need the full process, we can implement them in a simpler way.
00:08:51Since we didn't have a particular goal in mind for how we wanted the app to look, we asked it to improve the UI and told it not to proceed to the implementation part, just do the brainstorming and planning.
00:09:01It started with brainstorming and asked us which visual direction we wanted by asking clarifying questions on different aspects of the design.
00:09:08Claude then stopped after just planning as mentioned in the prompt, after which we asked it to implement the plan without using the process.
00:09:15The whole app's UI was changed in a significantly shorter time than the process-driven way would have taken, and despite that, it still committed the changes into git with the same format the process enforced.
00:09:25The app went from a basic layout with minimal styling to having improved color scheme, hovercard states, and a more functional layout overall.
00:09:32This is what makes the framework practical to actually use. You let Claude handle things without the process when it's already good at them and bring in the full process for those cases where it tends to fumble and mess up the implementation.
00:09:43That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below.
00:09:51As always, thank you for watching and I'll see you in the next one.

Key Takeaway

The Superpowers plugin transforms AI coding agents from unpredictable assistants into disciplined developers by enforcing strict software engineering principles, automated testing, and isolated workflows.

Highlights

The Superpowers plugin for AI IDEs like Claude Code enforces strict agile methodologies and TDD to prevent agents from skipping steps or ignoring instructions.

A key feature of the plugin is the use of 'strict gates' that require user approval before the AI can proceed to the next phase of development.

To prevent agents from overwriting each other's work

Timeline

Introduction to AI Coding Struggles and the Superpowers Plugin

The speaker addresses the common frustration where AI agents like Claude ignore system instructions or fail to follow Test-Driven Development (TDD) rules. They introduce the Superpowers plugin, which gained massive popularity by promising to enforce strict software development methodologies. Unlike simple specification agents, this tool uses 'strict gates' and explicit checkpoints to ensure the AI does not proceed until the current task is verified. The core philosophy centers on a systematic process over mere guessing, integrating principles like DRY and YAGNI. This section establishes the need for enforcement to prevent AI 'steering' and manual setup overhead.

Initial Setup and the Brainstorming Phase

After installing the plugin via the Claude Code marketplace, the team tests it by prompting the creation of a Trello-like project management app. The plugin triggers a specialized 'brainstorming skill' that prioritizes clarification over immediate execution, asking detailed questions about the tech stack and security. It identifies potential issues, such as browser-side database limitations, and suggests alternatives before finalizing architecture and UX designs. Documentation is automatically generated in a docs folder, and every design decision is committed to Git. This phase demonstrates how the plugin forces a level of project planning that native AI typically skips.

Planning and Sub-Agent Implementation with Git Worktrees

The video highlights the transition from planning to implementation using a 'writing plan' skill that breaks large tasks into manageable subtasks. A critical differentiator is the 'sub-agent driven implementation' mode, which sets up individual Git worktrees for different AI agents. This isolation prevents agents from overwriting each other's code, a common problem in standard multi-agent setups. Once a task is finished, a separate review sub-task verifies the code quality against the original specs before merging. This structured approach ensures that no new task begins until the previous one is fully validated and committed to the main branch.

Enforcing TDD and Managing Context Limits

The speaker notes that while the process is robust, it is highly resource-intensive, with a single iteration consuming nearly 50% of the AI's context window. They emphasize the importance of the 'compact' command to clear space while maintaining the session's logic for future features. The plugin shines in its native TDD approach, forcing the AI to write tests first and preventing it from modifying those tests to pass faulty code. Use of strong prompt cues like "if there is a 1% chance of using a skill, use it" keeps the agent on track. This section illustrates the trade-off between the slower, more expensive execution and the high-quality, working code it produces.

Systematic Debugging and Flexible Workflows

The final section covers the 'Systematic Debugging' skill, which follows a four-phase roadmap: identifying the root cause, isolating the bug, narrowing the reason, and applying the fix. This structured method is contrasted with the typical 'trial and error' approach AI uses, ending with mandatory testing to ensure the fix works. The speaker also provides a practical tip: users can bypass the heavy enforcement for simple UI tasks to save time. By asking the AI to brainstorm and plan but skip the 'process-driven' implementation, the user gets quick results that still follow established Git formats. The video concludes by framing the plugin as a tool to be used strategically where AI is most likely to fail.

Community Posts

View all posts