Gemini Conductor: Google's New Tool is Here To Fix AI Coding

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00If you've been following the channel you must be familiar with the different types of context

00:00:04engineering workflows that we've covered here. Well, Google also released another one. I wish

00:00:08I could say that it's better than other workflows. But the truth is that it's not. And there are many

00:00:13problems with this. Even if you argue that it's better for the Gemini ecosystem, it's still not

00:00:17good. Before we dive into why there was no need to release this, let's take a quick break to talk

00:00:22about Automata. After teaching millions of people how to build with AI, we started implementing these

00:00:27workflows ourselves. We discovered we could build better products faster than ever before. We help

00:00:32bring your ideas to life, whether it's apps or websites. Maybe you've watched our videos thinking

00:00:37I have a great idea but I don't have a tech team to build it. That's exactly where we come in.

00:00:42Think of us as your technical co-pilot. We apply the same workflows we've taught millions directly

00:00:48to your project, turning concepts into real working solutions without the headaches of hiring or

00:00:53managing a dev team. Ready to accelerate your idea into reality? Reach out at hello@automata.dev.

00:00:59Now before I explain the reason as to why this is just another poor attempt at a context engineering

00:01:04workflow, let's first dive into how Conductor actually works. So this is the article and I'll

00:01:09have a link for this down in the description below. At the end, you'll get a command to actually install

00:01:13this as an extension in Gemini CLI. For those of you who don't know, extensions are sets of commands,

00:01:18MCPs, and other rules that are bundled together and made into a package that people can then

00:01:23host and share with others. Claude also has something similar called plugins. So to actually

00:01:27start the workflow, you use the command and it installs. After installation, you can use its

00:01:32slash commands in Conductor. You'll get these five commands that actually control Conductor and how

00:01:37you use the workflow. Now the very first command that you're going to use is the setup command.

00:01:41What this command does is first check if the existing Conductor files such as the setup state

00:01:46and the other files that tell it if a project has already been initialized are available or not.

00:01:51Instead of stories, it makes up these files called tracks and completes those one by one.

00:01:56After that, it initialized a new GitHub repo and asked what to build. To test it out, I created a

00:02:02simple project but I did want to test whether the architecture it made would actually be good. So just

00:02:07to actually test if it would recommend the things that I would actually need, I told it that it should

00:02:11be production ready and scalable to a larger number of users. After that, it created the product.md file

00:02:17which contained the actual concept of what I wanted to build. To actually refine and craft it, it

00:02:22started asking me questions and at the end, because the questions weren't actually leading anywhere and

00:02:27they were really simplistic, I just had it auto generate everything. After it approved and saved

00:02:32the product guide, it wanted to create another file which was the product guidelines which were mainly

00:02:36focused on the styling of the product and some design principles. It also approved that and saved

00:02:41the product guidelines as well. After that, it defined the technology stack and this is one of the

00:02:45reasons the workflow was not good. It messed up the tech stack that it was offering me because it knew

00:02:50what my whole project was and it still didn't really recommend what was appropriate. After I had that

00:02:55corrected, it also approved the tech stack and updated that MD file as well. It also has these

00:03:00files called code style guides. If I go into the actual folder, these are the only languages that it

00:03:05has and if it thinks we are going to be using any of these in the project, it adds them to our current

00:03:10project's code style guides during the initialization. The default workflow that it's using is actually

00:03:15pretty good. By default, it includes 80% code test coverage and while it was setting stuff up and

00:03:20writing the base components, it was making sure that the tests were being written as well and after

00:03:25completing tasks, it was testing them as well. At the same time, it was committing changes after every

00:03:30task and also using git notes so that we could actually track where or whenever something went

00:03:36wrong. After completing the initial setup, it created some high level product requirements so

00:03:40that we could get on the initial track. This is the first track that it was trying to implement.

00:03:45Again, this was too broad and needed to be broken into smaller tracks. This was too much to do in

00:03:50one track and there were a lot of chances to mess up if it was doing this much at the same time. So

00:03:55after you complete that, you can start your work by running the implement command and in the tracks

00:03:59folder, you have different tracks that it implements one by one. Each track has two files, a plan.md

00:04:05and a spec.md. The spec.md contains the objective and the technical details extracted from the tech

00:04:11stack and the information that we inputted at the start. The plan.md actually contains the tasks

00:04:16that it needs to implement one by one. When you're actually using the implement command, it looks at

00:04:20the tracks.md and basically looks at each track where based on the status, it actually knows what

00:04:25to do. So if it's empty, it's not started. This means that it's in progress and this means that

00:04:30the track has been completed. And as you can see, this current track is in progress. As for the other

00:04:34commands, the status command gives you a status report of what is currently going on and which

00:04:39tracks are being followed and which ones are not complete. If you use the new track command, it's

00:04:43going to ask you the different questions again for the new task. I also implemented it in a pre-existing

00:04:48repository and it went pretty much the same way. It was a little different because it would look at

00:04:52the existing files and just ask me clarifying questions and it didn't ask for a new track.

00:04:57I had to implement a new track myself as a new feature. And then there's revert, another really

00:05:02clever feature that actually mitigates any damage and is git aware. So it uses git to help out if the

00:05:08agent messes up anywhere. Now, currently the file management and structure isn't that bad. The way

00:05:13it implements new features or existing tasks into tracks and then keeps track of them is actually

00:05:18pretty good. But the way the instructions have been written or how these command files have been

00:05:22written does need work because they're not really properly managing the context loop where it has to

00:05:27check everything. And if there is a change, then how it needs to change that. Because even during

00:05:31this initial process, there were a lot of mistakes. The first mistake is that while it was asking for

00:05:36the creation of each document, it didn't really dissect my idea properly. And I had to guide it

00:05:41through a lot of the stuff. When I thought it was adequate, I just let it auto generate the rest of

00:05:46the content. And again, as I mentioned before, while defining the technology stack, it also missed a lot

00:05:50of things. Option B was good. But since I told it that I wanted a fully scalable app with a large

00:05:55number of users, it missed a lot of things that I had to clarify and explicitly tell it that it also

00:06:00needed and then it modified the plan. When the initial track was generated, I actually went in

00:06:05and looked at the plan and the specs that it had generated and the database schema was totally

00:06:10incomplete. It had missed a lot of things that were crucial to setting up the app and I had to

00:06:14guide it again and steer it in the right direction. Now, Gemini is actually a really good model. So I

00:06:19have to suspect that the commands that have been implemented are what's making it behave this way.

00:06:23And then the biggest reason I believe that even though the setup itself is actually good, there

00:06:27are a lot of problems in the main slash commands and especially the workflow dot MD is because it

00:06:33messed up a really big part after I told it that I wanted to change NPM. And instead, I wanted to use

00:06:38P NPM since I had forgotten to mention it earlier. For some reason, it tried to make a backup first.

00:06:43And while doing that, it stated that it needed to remove the files made with NPM. But it ended

00:06:48up removing the entire conductor folder itself, which contained all the planning files. After

00:06:52deleting that it was continuously looking for the folder. And when it couldn't find it, it said that

00:06:57it would reconstruct the conductor folder using its context and everything that it had in its memory.

00:07:02So basically, it had to rewrite everything as opposed to what a normal context workflow should

00:07:07do, where the change should only affect the main context files and the files related to that

00:07:12specific task, which is what be mad does to operate efficiently. Now, if I hadn't asked it to abruptly

00:07:17change something, maybe it would have gone well. But still, when it was initializing all the tasks,

00:07:21and I asked it to start implementing the first track it began and initialize the project and the

00:07:26other core services that I needed. Now when it came to configuring the environment variables for the

00:07:31super base connection, for some reason, it automatically marked the task as completed while

00:07:36clearly putting a dummy key in there. It didn't even ask me to set up the super base project or

00:07:40provide it with an actual key. And it automatically tried to push the database schema. Since there was

00:07:45no actual key, it failed. And then it asked me to double check the string. So even the tasks aren't

00:07:49being properly updated, and it wasn't really following them correctly. I honestly wouldn't use

00:07:54this right now for end to end spec development. Be mad is a much better option. And for small projects,

00:07:59I still make my own context files. That brings us to the end of this video. If you'd like to support

00:08:03the channel and help us keep making videos like this, you can do so by using the super thanks

00:08:08button below. As always, thank you for watching and I'll see you in the next one.

Key Takeaway

Google's new Gemini Conductor AI coding tool has good structural concepts like track-based task management and git integration, but suffers from critical bugs and poor context management that make it inferior to existing alternatives like Be Mad.

Highlights

Google released Gemini Conductor, a new AI coding workflow tool that functions as an extension in Gemini CLI, but it has significant flaws compared to existing alternatives like Be Mad
Conductor uses a track-based system (instead of stories) to manage tasks, with automatic GitHub integration, test coverage requirements, and git-aware revert capabilities
The setup process involves creating product.md, guidelines, tech stack definitions, and code style guides, but the AI makes poor architecture recommendations even for production-ready requirements
Major bugs include deleting the entire conductor folder when asked to switch from NPM to PNPM, and failing to properly validate task completion (marking tasks done with dummy API keys)
The tool includes useful features like 80% code test coverage by default, automatic git commits after tasks, and git notes for tracking, but the core workflow instructions need significant improvement
Conductor's context management is poorly implemented - it doesn't properly check and update the context loop, leading to mistakes in project planning and database schema generation
The reviewer recommends using Be Mad for end-to-end development instead, as Conductor is currently unreliable for production use

Timeline

Introduction and Sponsor Segment

The video opens by positioning Gemini Conductor as Google's latest context engineering workflow tool, but immediately reveals it's not better than existing alternatives. The creator establishes credibility by mentioning previous coverage of similar workflows on the channel. A brief sponsor message introduces Automata, a service that helps non-technical founders build AI-powered apps and websites using the workflows taught in their videos. The sponsor positions themselves as a technical co-pilot that can turn ideas into working solutions without the need to hire or manage a development team.

What is Conductor and Initial Setup Process

Conductor is explained as an extension for Gemini CLI that bundles commands, MCPs, and rules into a shareable package (similar to Claude's plugins). The setup process begins with installation via a command, followed by five slash commands that control the workflow. The /setup command checks for existing Conductor files and initialization state, then creates 'tracks' (their version of stories) instead of traditional user stories. The system initializes a GitHub repo and prompts for project details. The creator tests it with a production-ready, scalable project to evaluate whether Conductor would recommend appropriate architecture choices. The tool creates a product.md file containing the project concept and asks clarifying questions, though these questions are described as simplistic and not particularly useful.

Product Guidelines and Tech Stack Configuration

After approving the product guide, Conductor creates product guidelines focused on styling and design principles. The technology stack definition phase reveals one of the workflow's major flaws - despite knowing the full project requirements, it recommends an inappropriate tech stack that needs manual correction. The system includes code style guides for specific languages, automatically adding relevant ones to the project during initialization. Default workflow settings include 80% code test coverage, which is actually a positive feature. The tool automatically writes tests alongside base components, tests completed tasks, commits changes after every task, and uses git notes for tracking issues - all good practices in theory.

Track System and Implementation Structure

Conductor creates high-level product requirements to generate the first track, but this initial track is too broad and needs breaking into smaller, more manageable pieces. The /implement command starts the actual work, referencing tracks in the tracks folder. Each track contains two files: spec.md (with objectives and technical details from the tech stack) and plan.md (with specific implementation tasks). The tracks.md file uses status indicators - empty means not started, a marker indicates in-progress, and another marker shows completion. Additional commands include /status for progress reports, /new-track for adding features, and /revert for git-aware rollback. The creator also tested Conductor on a pre-existing repository, where it analyzed existing files and asked clarifying questions without creating redundant tracks.

Critical Flaws in Context Management

Despite reasonable file management structure, the command files and instructions are poorly written and don't properly manage the context loop. During setup, Conductor failed to properly dissect the project idea, requiring extensive manual guidance. When defining the technology stack, it missed crucial components for a scalable app with large user capacity, only correcting after explicit instruction. The database schema generated for the initial track was completely incomplete, missing crucial elements necessary for app setup. The creator suspects Gemini itself is a capable model, but the implemented commands are causing poor behavior. The biggest problem occurred when switching from NPM to PNPM - Conductor attempted to create a backup but instead deleted the entire conductor folder containing all planning files, then tried to reconstruct everything from memory rather than following proper context workflow practices.

Implementation Failures and Final Verdict

During actual implementation of the first track, Conductor initialized the project and core services but failed catastrophically at basic tasks. When configuring Supabase environment variables, it automatically marked the task complete while inserting dummy API keys, never asking the user to set up a Supabase project or provide real credentials. It then attempted to push the database schema with invalid keys, failed, and only then asked for verification. This demonstrates that tasks aren't being properly updated or followed correctly. The creator concludes that Conductor shouldn't be used for end-to-end spec development in its current state, recommending Be Mad as a superior alternative. For small projects, the creator prefers making custom context files. The video ends with a call to support the channel via Super Thanks and a promise of future content.

Community Posts

Write about this video