Log in to leave a comment
No posts yet
The paradigm of software engineering is shifting. This project, conducted by Anthropic researcher Nicholas Carlini, isn't just about asking an AI to code. By mobilizing 16 instances of Claude Opus 4.6, a Rust-based C compiler was built from scratch with minimal human intervention.
The resulting product consists of 100,000 lines of code and successfully built the Linux 6.9 kernel and ran the classic game Doom. However, more important than the flashy exterior is the reality of the agentic workflow discovered while pouring $20,000 (approx. 27 million KRW) into API costs. We delve into the engineering reality of how to systematically control and collaborate with AI, moving beyond the stage of simply writing good prompts.
In complex systems programming, a single agent quickly hits the limits of its context window. This is because, as time passes, past conversation history creates hallucinations that cloud current judgment. To solve this, Carlini introduced 16 independent Docker containers along with the RALF (Refresh, Act, Learn, Feedback) loop.
README.md and pushing them to Git.The biggest risk when 16 agents are deployed simultaneously is resource waste. If two agents attempt to fix the same bug, it results in code conflicts and double the API costs. Instead of a separate complex database, Carlini implemented a lightweight locking mechanism using text flags within the Git repository.
Before starting a specific task, an agent creates a file with the same name as the task in the current_tasks/ directory. Thanks to Git's atomic commit nature, pushes from other agents trying to create the same file are rejected. This simple system fundamentally blocked race conditions between agents.
The highlight of this project was the use of GCC, an established tool, as an Oracle. This is a strategy of systematically forcing the correct answer rather than letting the AI guess. When an error occurred in the massive Linux kernel build, Carlini automated a binary search algorithm.
While the achievements were overwhelming, the performance of the generated compiler did not even reach GCC's lowest optimization level (-O0). The Claude agent legion showed limitations in high-level engineering areas such as:
From an engineering manager's perspective, $20,000 is by no means expensive. This is because a task that would have required a professional team of 5 people for over 3 months was completed in just 2 weeks. This proves a cost-effectiveness of more than 10x compared to traditional labor costs. To adopt this model, companies should follow this decision tree.
| Question | Yes | No |
|---|---|---|
| Can the output be objectively verified through testing? | Proceed to next step | Unsuitable for adoption (risk of hallucination) |
| Is there an existing tool (Oracle) for comparison? | Adopt Oracle strategy | Continuous human monitoring required |
| Can the task be split into 100+ units? | Operate parallel agents | Single agent recommended |
progress.json or similar before termination.Anthropic's experiment signifies that the role of the engineer has moved from code writer to system designer and auditor. The critical competency now is not the ability to write algorithms directly, but the ability to design logical constraints and verification harnesses so that the AI agent legion does not veer off track.
The $20,000 cost is not just an expense; it is a milestone showing the upper limit of automation AI can reach when backed by sophisticated human design. Companies should now focus on systematizing strategic human steering rather than being solely immersed in AI autonomy.