The ShadCN Loop Is The Greatest Fix For Your Broken UI

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Most of you already know shad cn as one of the most widely used ui libraries but using an ai

00:00:05agent to build with it can be problematic if you're building one-shot landing pages you won't have a

00:00:10huge problem but if you're building a new app or implementing a new feature things break and they

00:00:15break other parts of the app as well but this isn't anything new this problem has already been solved

00:00:20and it's how engineers build apps now ai agents always test the code that they write but these

00:00:25agents become unreliable with large contexts therefore we need a way to ensure that agents

00:00:31complete the work they're given this is where the concept of agentic loops comes in and anthropic

00:00:36solves this by using the ralph loop to solve my ui problem i tried to implement the ralph loop and

00:00:41at first it completely failed but i soon learned that it wasn't because of the ralph loop it was

00:00:45because of the process i had implemented with it so ralph is actually a new plugin that has been

00:00:50released by anthropic themselves but this wasn't one of their original ideas it's based on a

00:00:55technique by someone else and anthropic implemented and open-sourced it basically ralph is a loop and

00:01:00if you know about claud code hooks it uses these stop hooks which run when claud stops outputting

00:01:05an answer as soon as it stops the ai agent is fed its initial prompt file again and this allows it

00:01:11to iteratively improve its work now here's the important question when does it actually break out

00:01:15of the loop there's something called a completion promise which can be any word you input when claud

00:01:20feels like its task is completed it outputs this promise by itself for example in this case the

00:01:25promise is the word complete if the promise is in the return prompt then the loop doesn't run again

00:01:30so until claud outputs a promise it doesn't stop this makes sure that claud doesn't just quit

00:01:35whenever it wants after you install the plugin you'll have three commands the ralph loop command

00:01:40a cancel command and a help command in the loop command you need to provide the prompt that is fed

00:01:45to the agent again and again sometimes it may get an impossible task that it's not able to solve and

00:01:50it might get stuck in an infinite loop so setting a max iteration count is a really good practice

00:01:55i will leave the link to the repo below because they have some good best practices for the prompts

00:02:00you can give to the ralph loop but in this video i'm only going to discuss the ones related to the

00:02:04ui workflow that i'm going to show you so let's say we want to implement two features in this app

00:02:09one is a command palette where we add a menu to search through our app and execute other commands

00:02:14to make sure that this new feature doesn't break other parts of the app you would start with the

00:02:19tests this is called test driven development if you're not familiar with this you can ask claud

00:02:24code to set up the tdd structure for you where it creates an end-to-end test folder a screenshots

00:02:30folder to check for ui problems and the corresponding test as well the other feature we're

00:02:34going to implement is a board view in the database similar to what notion allows us to do with their

00:02:40databases if you've caught on test driven development is an approach where tests are

00:02:44written before the code is implemented but this means that the initial tests will always fail

00:02:49so if i'm implementing the command palette feature i wouldn't just start writing the code for it

00:02:54instead i would first write elaborate tests for it then we write the minimum amount of code required

00:02:59to pass those tests once that's done we refactor and add more functionality and with every addition

00:03:04we make sure the tests still pass another interesting thing is that these tests are automated

00:03:09and playwright can be imported and used for visual verification if you think that we're using the

00:03:14playwright mcp to autonomously verify this through the browser you're wrong with tdd for each

00:03:19functional behavior you can take screenshots for example if the functional behavior is adding a card

00:03:25then the screenshot would show a card added into the board so now all the ai agent has to do is look

00:03:30at those screenshots and make sure there are no problems in the way that these shad cn components

00:03:34have been implemented these test files ensure that whenever something new is added or while a feature

00:03:40is being built all of our behavioral requirements are fulfilled but in our case we want to use the

00:03:45screenshots purely for ui verification but if we already have tdd why do we need the ralph loop

00:03:51as i already stated with larger tasks and context windows becoming nearly filled these models

00:03:56abruptly quit their tasks and require constant human input therefore i can have tests written

00:04:01beforehand for any type of function that i want then use the loop to instruct it on what to do

00:04:05and it can work autonomously by telling it what workflow to follow and then giving it the condition

00:04:10for when it can output the promise it completes the task and exits the loop which in this case is

00:04:15when it passes all 25 unique tests so using the ralph slash command i gave it a prompt so that it

00:04:21would iteratively build the command palette feature in the prompt we were basically telling it to

00:04:26implement the feature along with some basic requirements which aren't really important because

00:04:31the requirements can be found in the tests as well but we did outline the whole workflow in that

00:04:35workflow it was supposed to start by running the tests it knows that the tests will fail and after

00:04:40that it needs to implement the components to make them pass so that's the whole goal now if this were

00:04:45a much broader task chances are that when the context window fills up or claud gets confused it

00:04:50will quit automatically it will never output the completion promise and since it never outputs the

00:04:55promise the prompt will be fed back again and it will have to start all over meaning it will

00:05:00iteratively keep working on it but since this was a smaller task it was actually able to implement

00:05:05everything in a single go write out all the components and make all of the tests pass now

00:05:10after the tests pass the workflow tells it to review all of the screenshots for the command palette

00:05:15these are screenshots taken at different stages to make sure that the ui whether it's shad cn or any

00:05:21other component library you're using is implemented correctly and that there aren't any minor issues

00:05:26after that it should run the tests again and make sure they still pass after the ui changes since all

00:05:31of the tests were passing and the screenshots were reviewed it output the completion promise this is

00:05:35where the loop stopped and didn't continue again but there was a really big problem with this which

00:05:40i didn't notice in the command palette feature because there were very few chances of ui errors

00:05:45there however when i moved on to implementing the board view i realized there was a huge mistake in

00:05:49the system i started by implementing the board with the same prompt the requirements were different of

00:05:54course but the workflow was pretty much the same now i was kind of surprised when it completed all

00:05:59of the requirements in one go don't get me wrong it was actually making sure that all of the tests

00:06:03were passing but while it was doing that there were cases where the number of successful tests

00:06:08would actually decrease because by changing something it would break something else and

00:06:12this is why tdd is actually really important because of this recursive testing and making

00:06:17sure that everything works but the main problem was that after it had verified that it was done

00:06:22and i went ahead and checked the ui most of the things were implemented correctly but it had

00:06:26completely missed some ui errors such as this one i also checked the screenshots and the errors were

00:06:31showing up in those screenshots as well so i asked it and we analyzed what actually went wrong the

00:06:37real issue was a process failure specifically in terms of fixing the ui what happened was that it

00:06:42did pass all of the tests because it was supposed to run the test files again and again but there

00:06:47was no specific test for the ui other than the screenshots it glanced at a few of them and it

00:06:52even ignored some of the ui errors that it had seen some files were completely ignored so the main

00:06:58issue was that it output its promise statement prematurely and didn't verify whether the ui was

00:07:03actually fixed we went through a whole brainstorming session on how we could fix this and i even gave

00:07:08the prompt writing best practices from the repo to clod code but in the end we came up with some

00:07:13specific rules and a change in the process that would ensure the ui was always correct now this

00:07:17had nothing to do with the tests because they're always going to run the prompt we used for the

00:07:22command palette is really helpful when the feature or implementation is very large where clod doesn't

00:07:27hallucinate that it has completed the task but instead due to a full context window or the

00:07:32complexity of the task it quits abruptly now clod code is already really autonomous there's no doubt

00:07:37about that but there are still issues like this that we need to fix so we changed a number of

00:07:42things in the main prompt the first was the screenshot verification protocol we added a

00:07:46simple prefix to each image that told clod whether it had read the screenshot or not but when i first

00:07:51implemented this it still didn't read all of the images it would read a few write verified on top

00:07:56of them and just like before it would quit early so to solve this we encouraged it to think in a

00:08:01different way we told it that after it renamed all the screenshots it should not output the promise

00:08:06yet meaning it should not consider the task completed and it should let the next iteration

00:08:10confirm completion so at least two loops should run what happens in the next verification is that

00:08:15clod verifies all the files have the verified prefix of course this meant we had to change

00:08:20the tests and separate the image verification from the functional tests the next iteration makes sure

00:08:25that all the images have verified results and if clod misses any it looks at them again and fixes

00:08:31the output with this change the small ui errors we were facing were finally fixed and it was able

00:08:36to implement all of these features correctly so when it entered the next loop it ran the tests

00:08:41again since it found some errors it fixed them and because all the files had the word verified in them

00:08:47it ran one final test this time it completed its task in two loops and was able to fix all the major

00:08:53ui errors in the app let's talk about automata now after teaching millions of people how to build with

00:08:58ai we started implementing these workflows ourselves we discovered we could build better

00:09:03products faster than ever before we help bring your ideas to life whether it's apps or websites

00:09:08maybe you've watched our videos thinking i have a great idea but i don't have a tech team to build

00:09:13it that's exactly where we come in think of us as your technical co-pilot we apply the same workflows

00:09:19we've taught millions directly to your project turning concepts into real working solutions

00:09:24without the headaches of hiring or managing a dev team ready to accelerate your idea into reality

00:09:30reach out at hello@automata.dev if you'd like to support the channel and help us keep making videos

00:09:36like this you can do so by using the super thanks button below as always thank you for watching and

00:09:41i'll see you in the next one

Key Takeaway

The RALPH loop combined with Test-Driven Development and a two-iteration screenshot verification process solves the problem of AI agents prematurely quitting complex UI tasks and missing visual errors in ShadCN-based applications.

Highlights

The RALPH loop is an Anthropic plugin that addresses AI agents quitting prematurely on complex tasks by continuously feeding the initial prompt until a completion promise is output
Test-Driven Development (TDD) combined with RALPH loop ensures UI features don't break existing app functionality by writing tests before implementing code
AI agents often fail to verify UI screenshots properly, leading to missed visual errors even when functional tests pass
A two-iteration verification process forces the agent to rename screenshots as 'verified' in one loop, then confirm all are verified in the next loop before completing
Playwright can automate visual verification by taking screenshots at different stages, allowing AI to check for UI implementation issues without browser automation
The completion promise mechanism prevents agents from quitting early - the loop only stops when the agent outputs a specific word indicating task completion
Process failure, not the RALPH loop itself, was the root cause of UI verification issues - the workflow needed specific rules for screenshot verification

Timeline

Introduction to the ShadCN AI Agent Problem

The video introduces the core problem with using AI agents to build applications with ShadCN UI library. While building one-shot landing pages works fine, implementing new features or building apps causes breakage in other parts of the application. The speaker explains this isn't a new problem - it's already been solved by how engineers currently build apps. AI agents test their code but become unreliable with large contexts, necessitating a method to ensure agents complete their assigned work. This is where agentic loops come into play, with Anthropic's RALPH loop being the proposed solution.

Understanding the RALPH Loop Mechanism

The RALPH loop is explained as a new plugin released by Anthropic based on someone else's technique that they implemented and open-sourced. It uses stop hooks from Claude Code that trigger when Claude stops outputting an answer, at which point the AI agent receives its initial prompt again for iterative improvement. The critical mechanism is the completion promise - any word that Claude outputs when it believes its task is complete. For example, if the promise is the word 'complete', the loop only stops when this word appears in the return prompt. This prevents Claude from quitting prematurely whenever it wants, ensuring task completion before exiting the loop.

RALPH Loop Commands and Best Practices

After installing the RALPH plugin, three commands become available: the loop command, a cancel command, and a help command. The loop command requires providing the prompt that gets fed to the agent repeatedly. A critical best practice mentioned is setting a max iteration count because the agent might get stuck in an infinite loop when given impossible tasks. The repository contains best practices for prompts, but the video focuses specifically on practices related to the UI workflow. The speaker promises to demonstrate these practices through the implementation examples that follow.

Test-Driven Development Setup for UI Features

Two features are planned for implementation: a command palette for searching and executing commands, and a board view similar to Notion's database. The approach uses Test-Driven Development (TDD) where tests are written before code implementation to ensure new features don't break existing functionality. Claude Code can set up the TDD structure including an end-to-end test folder, a screenshots folder for checking UI problems, and corresponding tests. The key principle is that in TDD, tests are written first and will initially fail, then minimum code is written to pass those tests, followed by refactoring and adding more functionality while ensuring tests continue to pass.

Automated Testing with Playwright for Visual Verification

The automated tests use Playwright for visual verification, but importantly, not through autonomous browser control via MCP. Instead, for each functional behavior (like adding a card), screenshots are taken showing the result (card added to board). The AI agent only needs to examine these screenshots to verify that ShadCN components have been implemented correctly without UI problems. These test files ensure that all behavioral requirements are fulfilled as features are built and new functionality is added. The screenshots serve purely for UI verification purposes, separating visual concerns from functional testing requirements.

Why RALPH Loop is Necessary Despite Having TDD

The speaker addresses why the RALPH loop is needed when TDD already exists. The answer lies in how AI models behave with larger tasks and nearly-full context windows - they abruptly quit tasks and require constant human input. By writing tests beforehand for any desired function and using the loop to instruct the workflow, the agent can work autonomously. The loop is told what workflow to follow and given the condition for outputting the completion promise, which in this case is passing all 25 unique tests. This combination allows the agent to complete tasks and exit the loop only when truly finished.

First RALPH Loop Implementation - Command Palette Success

Using the RALPH slash command, the speaker provided a prompt to iteratively build the command palette feature. The prompt outlined basic requirements (though these were redundant since requirements exist in tests) and importantly detailed the entire workflow. The workflow started with running tests, knowing they would fail, then implementing components to make them pass. While a broader task might cause Claude to quit when the context window fills, this smaller task was completed in a single iteration. The agent successfully wrote all components, passed all tests, reviewed screenshots for the command palette, verified no UI issues existed, and ran tests again before outputting the completion promise to exit the loop.

Board View Implementation Reveals Major Process Failure

Implementing the board view with the same prompt structure revealed a significant problem despite initially appearing successful. While Claude ensured tests passed, there were cases where the number of successful tests decreased because changes broke other functionality - demonstrating why TDD with recursive testing is crucial. The main issue emerged after verification when checking the actual UI: most features were implemented correctly, but Claude had completely missed some UI errors. Upon examining the screenshots, the errors were clearly visible there too, meaning the agent had failed to properly review the visual verification materials.

Diagnosing the Root Cause - Process Not Tool Failure

Analysis revealed the issue was a process failure specifically in UI fixing, not a problem with the RALPH loop itself. The agent successfully ran test files repeatedly as designed, but lacked a specific test for UI beyond the screenshots. It only glanced at a few screenshots, ignored some UI errors it did see, and completely skipped some files. The critical problem was premature output of the completion promise without verifying if the UI was actually fixed. A brainstorming session ensued, including reviewing prompt writing best practices from the repository, ultimately leading to specific rules and process changes to ensure UI correctness.

Screenshot Verification Protocol - First Attempt

The solution involved changing the main prompt with no modifications needed to the functional tests themselves. The previous prompt was helpful for large features where Claude wouldn't hallucinate completion but would quit due to full context windows or task complexity. The first major change was implementing a screenshot verification protocol with a simple prefix for each image indicating whether Claude had read it or not. However, the initial implementation still failed - Claude would read a few screenshots, mark them as verified, and quit early just like before, demonstrating that the prefix alone wasn't sufficient to ensure thorough review.

Two-Loop Verification Solution for UI Completeness

The breakthrough came from changing how Claude thinks about task completion. After renaming all screenshots, Claude is explicitly told not to output the promise yet and not to consider the task complete - it must let the next iteration confirm completion, ensuring at least two loops run. Tests were modified to separate image verification from functional tests. In the next iteration, Claude verifies all files have the verified prefix, and if any are missed, it reviews them again and fixes the output. This change finally eliminated the small UI errors, enabling correct implementation of all features.

Final Results and Automata Services Promotion

The improved process resulted in Claude entering the next loop, running tests, finding and fixing errors, and completing the task in two loops while fixing all major UI errors. The video concludes with promotion for Automata, the company that helps bring ideas to life by building apps and websites using the AI workflows they teach. They position themselves as a technical co-pilot for those with ideas but no tech team, applying the workflows taught in their videos to real projects. Viewers are invited to reach out at hello@automata.dev, support via Super Thanks, and the speaker signs off thanking viewers and promising more content.

Community Posts

Write about this video