00:00:00Most of you already know shad cn as one of the most widely used ui libraries but using an ai
00:00:05agent to build with it can be problematic if you're building one-shot landing pages you won't have a
00:00:10huge problem but if you're building a new app or implementing a new feature things break and they
00:00:15break other parts of the app as well but this isn't anything new this problem has already been solved
00:00:20and it's how engineers build apps now ai agents always test the code that they write but these
00:00:25agents become unreliable with large contexts therefore we need a way to ensure that agents
00:00:31complete the work they're given this is where the concept of agentic loops comes in and anthropic
00:00:36solves this by using the ralph loop to solve my ui problem i tried to implement the ralph loop and
00:00:41at first it completely failed but i soon learned that it wasn't because of the ralph loop it was
00:00:45because of the process i had implemented with it so ralph is actually a new plugin that has been
00:00:50released by anthropic themselves but this wasn't one of their original ideas it's based on a
00:00:55technique by someone else and anthropic implemented and open-sourced it basically ralph is a loop and
00:01:00if you know about claud code hooks it uses these stop hooks which run when claud stops outputting
00:01:05an answer as soon as it stops the ai agent is fed its initial prompt file again and this allows it
00:01:11to iteratively improve its work now here's the important question when does it actually break out
00:01:15of the loop there's something called a completion promise which can be any word you input when claud
00:01:20feels like its task is completed it outputs this promise by itself for example in this case the
00:01:25promise is the word complete if the promise is in the return prompt then the loop doesn't run again
00:01:30so until claud outputs a promise it doesn't stop this makes sure that claud doesn't just quit
00:01:35whenever it wants after you install the plugin you'll have three commands the ralph loop command
00:01:40a cancel command and a help command in the loop command you need to provide the prompt that is fed
00:01:45to the agent again and again sometimes it may get an impossible task that it's not able to solve and
00:01:50it might get stuck in an infinite loop so setting a max iteration count is a really good practice
00:01:55i will leave the link to the repo below because they have some good best practices for the prompts
00:02:00you can give to the ralph loop but in this video i'm only going to discuss the ones related to the
00:02:04ui workflow that i'm going to show you so let's say we want to implement two features in this app
00:02:09one is a command palette where we add a menu to search through our app and execute other commands
00:02:14to make sure that this new feature doesn't break other parts of the app you would start with the
00:02:19tests this is called test driven development if you're not familiar with this you can ask claud
00:02:24code to set up the tdd structure for you where it creates an end-to-end test folder a screenshots
00:02:30folder to check for ui problems and the corresponding test as well the other feature we're
00:02:34going to implement is a board view in the database similar to what notion allows us to do with their
00:02:40databases if you've caught on test driven development is an approach where tests are
00:02:44written before the code is implemented but this means that the initial tests will always fail
00:02:49so if i'm implementing the command palette feature i wouldn't just start writing the code for it
00:02:54instead i would first write elaborate tests for it then we write the minimum amount of code required
00:02:59to pass those tests once that's done we refactor and add more functionality and with every addition
00:03:04we make sure the tests still pass another interesting thing is that these tests are automated
00:03:09and playwright can be imported and used for visual verification if you think that we're using the
00:03:14playwright mcp to autonomously verify this through the browser you're wrong with tdd for each
00:03:19functional behavior you can take screenshots for example if the functional behavior is adding a card
00:03:25then the screenshot would show a card added into the board so now all the ai agent has to do is look
00:03:30at those screenshots and make sure there are no problems in the way that these shad cn components
00:03:34have been implemented these test files ensure that whenever something new is added or while a feature
00:03:40is being built all of our behavioral requirements are fulfilled but in our case we want to use the
00:03:45screenshots purely for ui verification but if we already have tdd why do we need the ralph loop
00:03:51as i already stated with larger tasks and context windows becoming nearly filled these models
00:03:56abruptly quit their tasks and require constant human input therefore i can have tests written
00:04:01beforehand for any type of function that i want then use the loop to instruct it on what to do
00:04:05and it can work autonomously by telling it what workflow to follow and then giving it the condition
00:04:10for when it can output the promise it completes the task and exits the loop which in this case is
00:04:15when it passes all 25 unique tests so using the ralph slash command i gave it a prompt so that it
00:04:21would iteratively build the command palette feature in the prompt we were basically telling it to
00:04:26implement the feature along with some basic requirements which aren't really important because
00:04:31the requirements can be found in the tests as well but we did outline the whole workflow in that
00:04:35workflow it was supposed to start by running the tests it knows that the tests will fail and after
00:04:40that it needs to implement the components to make them pass so that's the whole goal now if this were
00:04:45a much broader task chances are that when the context window fills up or claud gets confused it
00:04:50will quit automatically it will never output the completion promise and since it never outputs the
00:04:55promise the prompt will be fed back again and it will have to start all over meaning it will
00:05:00iteratively keep working on it but since this was a smaller task it was actually able to implement
00:05:05everything in a single go write out all the components and make all of the tests pass now
00:05:10after the tests pass the workflow tells it to review all of the screenshots for the command palette
00:05:15these are screenshots taken at different stages to make sure that the ui whether it's shad cn or any
00:05:21other component library you're using is implemented correctly and that there aren't any minor issues
00:05:26after that it should run the tests again and make sure they still pass after the ui changes since all
00:05:31of the tests were passing and the screenshots were reviewed it output the completion promise this is
00:05:35where the loop stopped and didn't continue again but there was a really big problem with this which
00:05:40i didn't notice in the command palette feature because there were very few chances of ui errors
00:05:45there however when i moved on to implementing the board view i realized there was a huge mistake in
00:05:49the system i started by implementing the board with the same prompt the requirements were different of
00:05:54course but the workflow was pretty much the same now i was kind of surprised when it completed all
00:05:59of the requirements in one go don't get me wrong it was actually making sure that all of the tests
00:06:03were passing but while it was doing that there were cases where the number of successful tests
00:06:08would actually decrease because by changing something it would break something else and
00:06:12this is why tdd is actually really important because of this recursive testing and making
00:06:17sure that everything works but the main problem was that after it had verified that it was done
00:06:22and i went ahead and checked the ui most of the things were implemented correctly but it had
00:06:26completely missed some ui errors such as this one i also checked the screenshots and the errors were
00:06:31showing up in those screenshots as well so i asked it and we analyzed what actually went wrong the
00:06:37real issue was a process failure specifically in terms of fixing the ui what happened was that it
00:06:42did pass all of the tests because it was supposed to run the test files again and again but there
00:06:47was no specific test for the ui other than the screenshots it glanced at a few of them and it
00:06:52even ignored some of the ui errors that it had seen some files were completely ignored so the main
00:06:58issue was that it output its promise statement prematurely and didn't verify whether the ui was
00:07:03actually fixed we went through a whole brainstorming session on how we could fix this and i even gave
00:07:08the prompt writing best practices from the repo to clod code but in the end we came up with some
00:07:13specific rules and a change in the process that would ensure the ui was always correct now this
00:07:17had nothing to do with the tests because they're always going to run the prompt we used for the
00:07:22command palette is really helpful when the feature or implementation is very large where clod doesn't
00:07:27hallucinate that it has completed the task but instead due to a full context window or the
00:07:32complexity of the task it quits abruptly now clod code is already really autonomous there's no doubt
00:07:37about that but there are still issues like this that we need to fix so we changed a number of
00:07:42things in the main prompt the first was the screenshot verification protocol we added a
00:07:46simple prefix to each image that told clod whether it had read the screenshot or not but when i first
00:07:51implemented this it still didn't read all of the images it would read a few write verified on top
00:07:56of them and just like before it would quit early so to solve this we encouraged it to think in a
00:08:01different way we told it that after it renamed all the screenshots it should not output the promise
00:08:06yet meaning it should not consider the task completed and it should let the next iteration
00:08:10confirm completion so at least two loops should run what happens in the next verification is that
00:08:15clod verifies all the files have the verified prefix of course this meant we had to change
00:08:20the tests and separate the image verification from the functional tests the next iteration makes sure
00:08:25that all the images have verified results and if clod misses any it looks at them again and fixes
00:08:31the output with this change the small ui errors we were facing were finally fixed and it was able
00:08:36to implement all of these features correctly so when it entered the next loop it ran the tests
00:08:41again since it found some errors it fixed them and because all the files had the word verified in them
00:08:47it ran one final test this time it completed its task in two loops and was able to fix all the major
00:08:53ui errors in the app let's talk about automata now after teaching millions of people how to build with
00:08:58ai we started implementing these workflows ourselves we discovered we could build better
00:09:03products faster than ever before we help bring your ideas to life whether it's apps or websites
00:09:08maybe you've watched our videos thinking i have a great idea but i don't have a tech team to build
00:09:13it that's exactly where we come in think of us as your technical co-pilot we apply the same workflows
00:09:19we've taught millions directly to your project turning concepts into real working solutions
00:09:24without the headaches of hiring or managing a dev team ready to accelerate your idea into reality
00:09:30reach out at hello@automata.dev if you'd like to support the channel and help us keep making videos
00:09:36like this you can do so by using the super thanks button below as always thank you for watching and
00:09:41i'll see you in the next one