Gemini Conductor: Google's New Tool is Here To Fix AI Coding

AAI LABS
Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00If you've been following the channel you must be familiar with the different types of context
00:00:04engineering workflows that we've covered here. Well, Google also released another one. I wish
00:00:08I could say that it's better than other workflows. But the truth is that it's not. And there are many
00:00:13problems with this. Even if you argue that it's better for the Gemini ecosystem, it's still not
00:00:17good. Before we dive into why there was no need to release this, let's take a quick break to talk
00:00:22about Automata. After teaching millions of people how to build with AI, we started implementing these
00:00:27workflows ourselves. We discovered we could build better products faster than ever before. We help
00:00:32bring your ideas to life, whether it's apps or websites. Maybe you've watched our videos thinking
00:00:37I have a great idea but I don't have a tech team to build it. That's exactly where we come in.
00:00:42Think of us as your technical co-pilot. We apply the same workflows we've taught millions directly
00:00:48to your project, turning concepts into real working solutions without the headaches of hiring or
00:00:53managing a dev team. Ready to accelerate your idea into reality? Reach out at hello@automata.dev.
00:00:59Now before I explain the reason as to why this is just another poor attempt at a context engineering
00:01:04workflow, let's first dive into how Conductor actually works. So this is the article and I'll
00:01:09have a link for this down in the description below. At the end, you'll get a command to actually install
00:01:13this as an extension in Gemini CLI. For those of you who don't know, extensions are sets of commands,
00:01:18MCPs, and other rules that are bundled together and made into a package that people can then
00:01:23host and share with others. Claude also has something similar called plugins. So to actually
00:01:27start the workflow, you use the command and it installs. After installation, you can use its
00:01:32slash commands in Conductor. You'll get these five commands that actually control Conductor and how
00:01:37you use the workflow. Now the very first command that you're going to use is the setup command.
00:01:41What this command does is first check if the existing Conductor files such as the setup state
00:01:46and the other files that tell it if a project has already been initialized are available or not.
00:01:51Instead of stories, it makes up these files called tracks and completes those one by one.
00:01:56After that, it initialized a new GitHub repo and asked what to build. To test it out, I created a
00:02:02simple project but I did want to test whether the architecture it made would actually be good. So just
00:02:07to actually test if it would recommend the things that I would actually need, I told it that it should
00:02:11be production ready and scalable to a larger number of users. After that, it created the product.md file
00:02:17which contained the actual concept of what I wanted to build. To actually refine and craft it, it
00:02:22started asking me questions and at the end, because the questions weren't actually leading anywhere and
00:02:27they were really simplistic, I just had it auto generate everything. After it approved and saved
00:02:32the product guide, it wanted to create another file which was the product guidelines which were mainly
00:02:36focused on the styling of the product and some design principles. It also approved that and saved
00:02:41the product guidelines as well. After that, it defined the technology stack and this is one of the
00:02:45reasons the workflow was not good. It messed up the tech stack that it was offering me because it knew
00:02:50what my whole project was and it still didn't really recommend what was appropriate. After I had that
00:02:55corrected, it also approved the tech stack and updated that MD file as well. It also has these
00:03:00files called code style guides. If I go into the actual folder, these are the only languages that it
00:03:05has and if it thinks we are going to be using any of these in the project, it adds them to our current
00:03:10project's code style guides during the initialization. The default workflow that it's using is actually
00:03:15pretty good. By default, it includes 80% code test coverage and while it was setting stuff up and
00:03:20writing the base components, it was making sure that the tests were being written as well and after
00:03:25completing tasks, it was testing them as well. At the same time, it was committing changes after every
00:03:30task and also using git notes so that we could actually track where or whenever something went
00:03:36wrong. After completing the initial setup, it created some high level product requirements so
00:03:40that we could get on the initial track. This is the first track that it was trying to implement.
00:03:45Again, this was too broad and needed to be broken into smaller tracks. This was too much to do in
00:03:50one track and there were a lot of chances to mess up if it was doing this much at the same time. So
00:03:55after you complete that, you can start your work by running the implement command and in the tracks
00:03:59folder, you have different tracks that it implements one by one. Each track has two files, a plan.md
00:04:05and a spec.md. The spec.md contains the objective and the technical details extracted from the tech
00:04:11stack and the information that we inputted at the start. The plan.md actually contains the tasks
00:04:16that it needs to implement one by one. When you're actually using the implement command, it looks at
00:04:20the tracks.md and basically looks at each track where based on the status, it actually knows what
00:04:25to do. So if it's empty, it's not started. This means that it's in progress and this means that
00:04:30the track has been completed. And as you can see, this current track is in progress. As for the other
00:04:34commands, the status command gives you a status report of what is currently going on and which
00:04:39tracks are being followed and which ones are not complete. If you use the new track command, it's
00:04:43going to ask you the different questions again for the new task. I also implemented it in a pre-existing
00:04:48repository and it went pretty much the same way. It was a little different because it would look at
00:04:52the existing files and just ask me clarifying questions and it didn't ask for a new track.
00:04:57I had to implement a new track myself as a new feature. And then there's revert, another really
00:05:02clever feature that actually mitigates any damage and is git aware. So it uses git to help out if the
00:05:08agent messes up anywhere. Now, currently the file management and structure isn't that bad. The way
00:05:13it implements new features or existing tasks into tracks and then keeps track of them is actually
00:05:18pretty good. But the way the instructions have been written or how these command files have been
00:05:22written does need work because they're not really properly managing the context loop where it has to
00:05:27check everything. And if there is a change, then how it needs to change that. Because even during
00:05:31this initial process, there were a lot of mistakes. The first mistake is that while it was asking for
00:05:36the creation of each document, it didn't really dissect my idea properly. And I had to guide it
00:05:41through a lot of the stuff. When I thought it was adequate, I just let it auto generate the rest of
00:05:46the content. And again, as I mentioned before, while defining the technology stack, it also missed a lot
00:05:50of things. Option B was good. But since I told it that I wanted a fully scalable app with a large
00:05:55number of users, it missed a lot of things that I had to clarify and explicitly tell it that it also
00:06:00needed and then it modified the plan. When the initial track was generated, I actually went in
00:06:05and looked at the plan and the specs that it had generated and the database schema was totally
00:06:10incomplete. It had missed a lot of things that were crucial to setting up the app and I had to
00:06:14guide it again and steer it in the right direction. Now, Gemini is actually a really good model. So I
00:06:19have to suspect that the commands that have been implemented are what's making it behave this way.
00:06:23And then the biggest reason I believe that even though the setup itself is actually good, there
00:06:27are a lot of problems in the main slash commands and especially the workflow dot MD is because it
00:06:33messed up a really big part after I told it that I wanted to change NPM. And instead, I wanted to use
00:06:38P NPM since I had forgotten to mention it earlier. For some reason, it tried to make a backup first.
00:06:43And while doing that, it stated that it needed to remove the files made with NPM. But it ended
00:06:48up removing the entire conductor folder itself, which contained all the planning files. After
00:06:52deleting that it was continuously looking for the folder. And when it couldn't find it, it said that
00:06:57it would reconstruct the conductor folder using its context and everything that it had in its memory.
00:07:02So basically, it had to rewrite everything as opposed to what a normal context workflow should
00:07:07do, where the change should only affect the main context files and the files related to that
00:07:12specific task, which is what be mad does to operate efficiently. Now, if I hadn't asked it to abruptly
00:07:17change something, maybe it would have gone well. But still, when it was initializing all the tasks,
00:07:21and I asked it to start implementing the first track it began and initialize the project and the
00:07:26other core services that I needed. Now when it came to configuring the environment variables for the
00:07:31super base connection, for some reason, it automatically marked the task as completed while
00:07:36clearly putting a dummy key in there. It didn't even ask me to set up the super base project or
00:07:40provide it with an actual key. And it automatically tried to push the database schema. Since there was
00:07:45no actual key, it failed. And then it asked me to double check the string. So even the tasks aren't
00:07:49being properly updated, and it wasn't really following them correctly. I honestly wouldn't use
00:07:54this right now for end to end spec development. Be mad is a much better option. And for small projects,
00:07:59I still make my own context files. That brings us to the end of this video. If you'd like to support
00:08:03the channel and help us keep making videos like this, you can do so by using the super thanks
00:08:08button below. As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Google's new Gemini Conductor AI coding tool has good structural concepts like track-based task management and git integration, but suffers from critical bugs and poor context management that make it inferior to existing alternatives like Be Mad.

Highlights

Google released Gemini Conductor, a new AI coding workflow tool that functions as an extension in Gemini CLI, but it has significant flaws compared to existing alternatives like Be Mad

Conductor uses a track-based system (instead of stories) to manage tasks, with automatic GitHub integration, test coverage requirements, and git-aware revert capabilities

The setup process involves creating product.md, guidelines, tech stack definitions, and code style guides, but the AI makes poor architecture recommendations even for production-ready requirements

Major bugs include deleting the entire conductor folder when asked to switch from NPM to PNPM, and failing to properly validate task completion (marking tasks done with dummy API keys)

The tool includes useful features like 80% code test coverage by default, automatic git commits after tasks, and git notes for tracking, but the core workflow instructions need significant improvement

Conductor's context management is poorly implemented - it doesn't properly check and update the context loop, leading to mistakes in project planning and database schema generation

The reviewer recommends using Be Mad for end-to-end development instead, as Conductor is currently unreliable for production use

Timeline

Introduction and Sponsor Segment

The video opens by positioning Gemini Conductor as Google's latest context engineering workflow tool, but immediately reveals it's not better than existing alternatives. The creator establishes credibility by mentioning previous coverage of similar workflows on the channel. A brief sponsor message introduces Automata, a service that helps non-technical founders build AI-powered apps and websites using the workflows taught in their videos. The sponsor positions themselves as a technical co-pilot that can turn ideas into working solutions without the need to hire or manage a development team.

What is Conductor and Initial Setup Process

Conductor is explained as an extension for Gemini CLI that bundles commands, MCPs, and rules into a shareable package (similar to Claude's plugins). The setup process begins with installation via a command, followed by five slash commands that control the workflow. The /setup command checks for existing Conductor files and initialization state, then creates 'tracks' (their version of stories) instead of traditional user stories. The system initializes a GitHub repo and prompts for project details. The creator tests it with a production-ready, scalable project to evaluate whether Conductor would recommend appropriate architecture choices. The tool creates a product.md file containing the project concept and asks clarifying questions, though these questions are described as simplistic and not particularly useful.

Product Guidelines and Tech Stack Configuration

After approving the product guide, Conductor creates product guidelines focused on styling and design principles. The technology stack definition phase reveals one of the workflow's major flaws - despite knowing the full project requirements, it recommends an inappropriate tech stack that needs manual correction. The system includes code style guides for specific languages, automatically adding relevant ones to the project during initialization. Default workflow settings include 80% code test coverage, which is actually a positive feature. The tool automatically writes tests alongside base components, tests completed tasks, commits changes after every task, and uses git notes for tracking issues - all good practices in theory.

Track System and Implementation Structure

Conductor creates high-level product requirements to generate the first track, but this initial track is too broad and needs breaking into smaller, more manageable pieces. The /implement command starts the actual work, referencing tracks in the tracks folder. Each track contains two files: spec.md (with objectives and technical details from the tech stack) and plan.md (with specific implementation tasks). The tracks.md file uses status indicators - empty means not started, a marker indicates in-progress, and another marker shows completion. Additional commands include /status for progress reports, /new-track for adding features, and /revert for git-aware rollback. The creator also tested Conductor on a pre-existing repository, where it analyzed existing files and asked clarifying questions without creating redundant tracks.

Critical Flaws in Context Management

Despite reasonable file management structure, the command files and instructions are poorly written and don't properly manage the context loop. During setup, Conductor failed to properly dissect the project idea, requiring extensive manual guidance. When defining the technology stack, it missed crucial components for a scalable app with large user capacity, only correcting after explicit instruction. The database schema generated for the initial track was completely incomplete, missing crucial elements necessary for app setup. The creator suspects Gemini itself is a capable model, but the implemented commands are causing poor behavior. The biggest problem occurred when switching from NPM to PNPM - Conductor attempted to create a backup but instead deleted the entire conductor folder containing all planning files, then tried to reconstruct everything from memory rather than following proper context workflow practices.

Implementation Failures and Final Verdict

During actual implementation of the first track, Conductor initialized the project and core services but failed catastrophically at basic tasks. When configuring Supabase environment variables, it automatically marked the task complete while inserting dummy API keys, never asking the user to set up a Supabase project or provide real credentials. It then attempted to push the database schema with invalid keys, failed, and only then asked for verification. This demonstrates that tasks aren't being properly updated or followed correctly. The creator concludes that Conductor shouldn't be used for end-to-end spec development in its current state, recommending Be Mad as a superior alternative. For small projects, the creator prefers making custom context files. The video ends with a call to support the channel via Super Thanks and a promise of future content.

Community Posts

View all posts