DON'T WASTE SO MANY TOKENS! PI CODING AGENT vs OPENCODE with same local LLM.

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

Computing/SoftwareVideo & Computer GamesInternet Technology

Transcript

00:00:00hi everyone this is a demo by agent versus open code and we will test these

00:00:09two harness on this example this is a game that I vibe coded in my previous

00:00:20video this one and yeah in this video I'd like to test how to fix this game

00:00:29because there are some bugs for instance as you can see the X marker won the

00:00:38match but the cells aren't highlighted so we will try to do the same kind of fix

00:00:51with a local LLM which is quen 3.6 35b a 3b which is in my opinion the best model

00:01:04that you can run on your computer right now so let's try first with the PI so

00:01:16this is PI and I will run it inside this directory where there are the sources in

00:01:30separated files so I have index HTML game.js and style.js and we will try

00:01:42the same prompt inside both harnesses and we will compare the results and I also

00:01:55will use a timer to see how much it will take to do the task. So this is the prompt. The

00:02:11prompt is make the cell cubes more visible and add the space among them

00:02:19because as you can see here the cubes are very close each other and then we have

00:02:28the second task which is improve the winner logic the winning markers should

00:02:37became green and this is another issues because you you don't see where the

00:02:46player won with the markers and yeah it started to follow my prompt and this is

00:02:59PI so it start analyzing the current directory and here you can see the

00:03:09context used and but maybe it's more interesting to see the time spent to to

00:03:20fix the game and yeah it is working and then we will do the same task with the

00:03:30open code and I will reset the repo to do the same kind of test. So now I will

00:03:41pause a video for a while and see you when it will finish to fix the game.

00:04:00Ok, done. It is still writing the report of the changes and then we will test the

00:04:20results. Ok, done. Let's pause 7 minutes and 44 seconds with the quen 3.6 so let's

00:04:38try the results. So this is the report so this is what happened technically in the

00:04:47code and as you can see it partially read the game.js multiple times in

00:04:58multiple parts and this is also a diff so as you can see it had to edit a lot

00:05:09to the file and in total it's 9.4 K token sent and received the free 2.8 K so this

00:05:23is the results of the context usage so let's try the result so reload and as you

00:05:35can see now the cubes the cell cubes are more spaced more separated each other so

00:05:44let's try the game so I will start with the center cell ok and ok I will leave it

00:06:00win ok perfect so now the computer won and as you can see we have the cubes more

00:06:11separated and also the winner markers highlighted so it works and this was with

00:06:20the Pi coding agent so now we will do the same test with the open code and same

00:06:30model and same code so I will reset the code ok so now the changes are back to

00:06:50the bagged version like this and now we'll try the same prompt with the open

00:07:00code so for the cells and for the win logic and I will use the same model with

00:07:11the Basico and the Basico it's a custom agent that I made and start also and I

00:07:27made the Basico agent because it's much simpler than the default coding agent

00:07:36and Basico agent it's this

00:07:56it's just a simple markdown file so you are basical a minimal agent and yeah I I

00:08:07didn't specify a lot here just to use a web fetch with the search engine tool

00:08:15which we won't use it in this use case so it's a very simple agent just to to see

00:08:24just to recreate the similar conditions for open code and we already are using

00:08:3412k of the context so it started with the index game JS and yeah also here we will

00:08:47try the final result after the video pause it is still running with the not

00:08:58much feedback here and I also wanted to say that I tried the same test them also

00:09:07with Gemma for 26 be a 4b but it wasn't able to do the tool calling on on this

00:09:20kind of project so Gemma for was able to recreate the 3d tic-tac-toe game but then

00:09:30it wasn't able to do the two calls to edit these files so I did this test only

00:09:38with the when 3.6 because I think it's the best for local scenarios like this

00:09:48yeah interesting because it is filling the to do's so there are two tasks one is

00:09:58make cell cubes more visible and the other is fix the logic so it will have a

00:10:07little bit of overhead compared to pi agent but yeah pi agent was able to do

00:10:17this kind of task also without to do in the middle but maybe in more intricate

00:10:26the situations it could be useful to have a to do but yeah it's the LLM model

00:10:35which makes the bigger difference in my opinion and not the harness but we will

00:10:44see

00:10:56you

00:11:27okay almost done both to do's has been completed but it still have to read and

00:11:40then write to file

00:11:52okay it is right in the report I hope that then it will finish and we are 12

00:12:05minutes so it's more but okay it's finished so pose and as you can see the

00:12:15context they use the it's 23 K about with the open code and probably they report

00:12:26the tokens used in a different way but it seems that PI use the alpha the tokens to

00:12:36to fix the issues so this is the technical report so it opened many time

00:12:46game.js to do the fixes so let's try the game to see if the fix fixes actually

00:12:57works so reload and it seems similar to the PI version the center cell okay

00:13:19let's try to win the game okay I won and as you can see we got the same result

00:13:32that we got with PI but with more tokens and more time spent to to do the

00:13:43solutions so in this use case the the open code that usually has many features

00:13:55like guardrails and more prompt tweaks had the same solutions that we got with

00:14:06the PI but with less time and less tokens so in conclusion in in my opinion as I

00:14:18said before the LLM used is the most relevant and important part the the

00:14:28harnesses is useful and important but is more important to the quality of data

00:14:36that it put in the context and in this situation with the PI coding agent we

00:14:47have less overhead and we got a good result also without a very big prompt in

00:14:58the LLM let me know in the comments which is your preferred the open source

00:15:06harness coding agent and see you in another video bye

Key Takeaway

Pi Coding Agent outperforms OpenCode in local LLM tasks by delivering identical 3D game fixes while consuming 50% fewer tokens and completing the work 4 minutes faster.

Highlights

Pi Coding Agent fixes a bugged 3D Tic-Tac-Toe game in 7 minutes and 44 seconds using the Qwen 2.5 35B model.
OpenCode requires 12 minutes to complete the same debugging task using identical hardware and the same Qwen 2.5 35B model.
Pi Coding Agent utilizes 9.4K input tokens and 2.8K output tokens to solve the visual and logic issues in the game.
OpenCode consumes approximately 23K tokens of context to achieve the same functional results as Pi Coding Agent.
Qwen 2.5 35B successfully executes the necessary tool calls for project editing, whereas Gemma 2 27B fails to perform these calls in the same environment.
Pi Coding Agent achieves the desired results without the overhead of creating an intermediate 'to-do' list during the reasoning process.

Timeline

Local LLM Performance Comparison Setup

A 3D Tic-Tac-Toe game built via 'vibe coding' contains visual bugs and logic errors in the winning cell highlight system.
Qwen 2.5 35B is the selected local model for its superior performance in local development environments.
The comparison involves running Pi Coding Agent and OpenCode on the same directory containing index.html, game.js, and style.js files.

The test environment uses a specific 3D game project where X-markers fail to highlight winning rows. Testing occurs on a local machine to eliminate external API variability. This section establishes the baseline by identifying the specific code files and the choice of model used for the benchmarks.

Pi Coding Agent Execution and Resource Usage

Pi Coding Agent completes the requested CSS and JavaScript modifications in 7 minutes and 44 seconds.
Total token usage for this specific task is 9.4K sent and 2.8K received.
The agent modifies the game to include increased spacing between cell cubes and green highlights for winning markers.

The prompt requires making cell cubes more visible and improving the winner logic. Pi Coding Agent analyzes the directory and applies a diff to the files efficiently. The resulting game state confirms that the computer player can win and trigger the correct visual feedback in the UI.

OpenCode Performance and Token Overhead

OpenCode takes 12 minutes to finish the identical task, roughly 50% longer than the competing harness.
The context usage reaches 23K tokens, which is more than double the amount used by Pi Coding Agent.
OpenCode adds overhead by generating intermediate 'to-do' lists before performing the actual file edits.

A custom 'Basico' agent is used within OpenCode to keep the comparison fair, yet it still shows significant overhead. While it eventually produces a working game with the correct spacing and winning highlights, the process is slower and more resource-intensive. This section demonstrates that complex guardrails and multi-step prompt tweaks in OpenCode lead to higher latency.

Comparative Analysis and Model Selection

The underlying LLM quality remains the most critical factor in successful coding outcomes.
Gemma 2 27B is incapable of handling the required tool calls for this specific project despite its general reasoning capabilities.
Minimalist harnesses like Pi Coding Agent provide better efficiency for local development by reducing unnecessary context overhead.

The results prove that different harnesses can produce the same output with vastly different resource footprints. Qwen 2.5 35B emerges as the most capable local model for tool-calling tasks in this project. The final analysis suggests that the quality of data placed in the context is more important than the specific features of the agent harness.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video