DON'T WASTE SO MANY TOKENS! PI CODING AGENT vs OPENCODE with same local LLM.

LLuigi Tech
Computing/SoftwareVideo & Computer GamesInternet Technology

Transcript

00:00:00hi everyone this is a demo by agent versus open code and we will test these
00:00:09two harness on this example this is a game that I vibe coded in my previous
00:00:20video this one and yeah in this video I'd like to test how to fix this game
00:00:29because there are some bugs for instance as you can see the X marker won the
00:00:38match but the cells aren't highlighted so we will try to do the same kind of fix
00:00:51with a local LLM which is quen 3.6 35b a 3b which is in my opinion the best model
00:01:04that you can run on your computer right now so let's try first with the PI so
00:01:16this is PI and I will run it inside this directory where there are the sources in
00:01:30separated files so I have index HTML game.js and style.js and we will try
00:01:42the same prompt inside both harnesses and we will compare the results and I also
00:01:55will use a timer to see how much it will take to do the task. So this is the prompt. The
00:02:11prompt is make the cell cubes more visible and add the space among them
00:02:19because as you can see here the cubes are very close each other and then we have
00:02:28the second task which is improve the winner logic the winning markers should
00:02:37became green and this is another issues because you you don't see where the
00:02:46player won with the markers and yeah it started to follow my prompt and this is
00:02:59PI so it start analyzing the current directory and here you can see the
00:03:09context used and but maybe it's more interesting to see the time spent to to
00:03:20fix the game and yeah it is working and then we will do the same task with the
00:03:30open code and I will reset the repo to do the same kind of test. So now I will
00:03:41pause a video for a while and see you when it will finish to fix the game.
00:04:00Ok, done. It is still writing the report of the changes and then we will test the
00:04:20results. Ok, done. Let's pause 7 minutes and 44 seconds with the quen 3.6 so let's
00:04:38try the results. So this is the report so this is what happened technically in the
00:04:47code and as you can see it partially read the game.js multiple times in
00:04:58multiple parts and this is also a diff so as you can see it had to edit a lot
00:05:09to the file and in total it's 9.4 K token sent and received the free 2.8 K so this
00:05:23is the results of the context usage so let's try the result so reload and as you
00:05:35can see now the cubes the cell cubes are more spaced more separated each other so
00:05:44let's try the game so I will start with the center cell ok and ok I will leave it
00:06:00win ok perfect so now the computer won and as you can see we have the cubes more
00:06:11separated and also the winner markers highlighted so it works and this was with
00:06:20the Pi coding agent so now we will do the same test with the open code and same
00:06:30model and same code so I will reset the code ok so now the changes are back to
00:06:50the bagged version like this and now we'll try the same prompt with the open
00:07:00code so for the cells and for the win logic and I will use the same model with
00:07:11the Basico and the Basico it's a custom agent that I made and start also and I
00:07:27made the Basico agent because it's much simpler than the default coding agent
00:07:36and Basico agent it's this
00:07:56it's just a simple markdown file so you are basical a minimal agent and yeah I I
00:08:07didn't specify a lot here just to use a web fetch with the search engine tool
00:08:15which we won't use it in this use case so it's a very simple agent just to to see
00:08:24just to recreate the similar conditions for open code and we already are using
00:08:3412k of the context so it started with the index game JS and yeah also here we will
00:08:47try the final result after the video pause it is still running with the not
00:08:58much feedback here and I also wanted to say that I tried the same test them also
00:09:07with Gemma for 26 be a 4b but it wasn't able to do the tool calling on on this
00:09:20kind of project so Gemma for was able to recreate the 3d tic-tac-toe game but then
00:09:30it wasn't able to do the two calls to edit these files so I did this test only
00:09:38with the when 3.6 because I think it's the best for local scenarios like this
00:09:48yeah interesting because it is filling the to do's so there are two tasks one is
00:09:58make cell cubes more visible and the other is fix the logic so it will have a
00:10:07little bit of overhead compared to pi agent but yeah pi agent was able to do
00:10:17this kind of task also without to do in the middle but maybe in more intricate
00:10:26the situations it could be useful to have a to do but yeah it's the LLM model
00:10:35which makes the bigger difference in my opinion and not the harness but we will
00:10:44see
00:10:56you
00:11:27okay almost done both to do's has been completed but it still have to read and
00:11:40then write to file
00:11:52okay it is right in the report I hope that then it will finish and we are 12
00:12:05minutes so it's more but okay it's finished so pose and as you can see the
00:12:15context they use the it's 23 K about with the open code and probably they report
00:12:26the tokens used in a different way but it seems that PI use the alpha the tokens to
00:12:36to fix the issues so this is the technical report so it opened many time
00:12:46game.js to do the fixes so let's try the game to see if the fix fixes actually
00:12:57works so reload and it seems similar to the PI version the center cell okay
00:13:19let's try to win the game okay I won and as you can see we got the same result
00:13:32that we got with PI but with more tokens and more time spent to to do the
00:13:43solutions so in this use case the the open code that usually has many features
00:13:55like guardrails and more prompt tweaks had the same solutions that we got with
00:14:06the PI but with less time and less tokens so in conclusion in in my opinion as I
00:14:18said before the LLM used is the most relevant and important part the the
00:14:28harnesses is useful and important but is more important to the quality of data
00:14:36that it put in the context and in this situation with the PI coding agent we
00:14:47have less overhead and we got a good result also without a very big prompt in
00:14:58the LLM let me know in the comments which is your preferred the open source
00:15:06harness coding agent and see you in another video bye

Key Takeaway

Pi Coding Agent outperforms OpenCode in local LLM tasks by delivering identical 3D game fixes while consuming 50% fewer tokens and completing the work 4 minutes faster.

Highlights

  • Pi Coding Agent fixes a bugged 3D Tic-Tac-Toe game in 7 minutes and 44 seconds using the Qwen 2.5 35B model.

  • OpenCode requires 12 minutes to complete the same debugging task using identical hardware and the same Qwen 2.5 35B model.

  • Pi Coding Agent utilizes 9.4K input tokens and 2.8K output tokens to solve the visual and logic issues in the game.

  • OpenCode consumes approximately 23K tokens of context to achieve the same functional results as Pi Coding Agent.

  • Qwen 2.5 35B successfully executes the necessary tool calls for project editing, whereas Gemma 2 27B fails to perform these calls in the same environment.

  • Pi Coding Agent achieves the desired results without the overhead of creating an intermediate 'to-do' list during the reasoning process.

Timeline

Local LLM Performance Comparison Setup

  • A 3D Tic-Tac-Toe game built via 'vibe coding' contains visual bugs and logic errors in the winning cell highlight system.
  • Qwen 2.5 35B is the selected local model for its superior performance in local development environments.
  • The comparison involves running Pi Coding Agent and OpenCode on the same directory containing index.html, game.js, and style.js files.

The test environment uses a specific 3D game project where X-markers fail to highlight winning rows. Testing occurs on a local machine to eliminate external API variability. This section establishes the baseline by identifying the specific code files and the choice of model used for the benchmarks.

Pi Coding Agent Execution and Resource Usage

  • Pi Coding Agent completes the requested CSS and JavaScript modifications in 7 minutes and 44 seconds.
  • Total token usage for this specific task is 9.4K sent and 2.8K received.
  • The agent modifies the game to include increased spacing between cell cubes and green highlights for winning markers.

The prompt requires making cell cubes more visible and improving the winner logic. Pi Coding Agent analyzes the directory and applies a diff to the files efficiently. The resulting game state confirms that the computer player can win and trigger the correct visual feedback in the UI.

OpenCode Performance and Token Overhead

  • OpenCode takes 12 minutes to finish the identical task, roughly 50% longer than the competing harness.
  • The context usage reaches 23K tokens, which is more than double the amount used by Pi Coding Agent.
  • OpenCode adds overhead by generating intermediate 'to-do' lists before performing the actual file edits.

A custom 'Basico' agent is used within OpenCode to keep the comparison fair, yet it still shows significant overhead. While it eventually produces a working game with the correct spacing and winning highlights, the process is slower and more resource-intensive. This section demonstrates that complex guardrails and multi-step prompt tweaks in OpenCode lead to higher latency.

Comparative Analysis and Model Selection

  • The underlying LLM quality remains the most critical factor in successful coding outcomes.
  • Gemma 2 27B is incapable of handling the required tool calls for this specific project despite its general reasoning capabilities.
  • Minimalist harnesses like Pi Coding Agent provide better efficiency for local development by reducing unnecessary context overhead.

The results prove that different harnesses can produce the same output with vastly different resource footprints. Qwen 2.5 35B emerges as the most capable local model for tool-calling tasks in this project. The final analysis suggests that the quality of data placed in the context is more important than the specific features of the agent harness.

Community Posts

No posts yet. Be the first to write about this video!

Write about this video