AI Agents LOVE CLIs

Englishالعربية Deutsch Español Français हिन्दी Bahasa Indonesia 日本語 한국어 Português Русский 中文

MMaximilian Schwarzmüller

Computing/SoftwareSmall Business/StartupsInternet Technology

Transcript

00:00:00Now that AI agents are becoming more and more useful and more and more of a thing at least

00:00:05for some tasks, I think it's really interesting to see how we're kind of going full circle.

00:00:11And here's what I mean. If we take a look at the history of computers and the internet

00:00:16as a whole maybe, we could maybe draw a ease of use chart that looks something like this.

00:00:23Of course this is totally made up but you probably get my point here. We started in the 1970s

00:00:30or somewhere around there – don't nail me on a specific year – but in the early

00:00:36days when you and me – well, I wasn't even born – but when normal households didn't

00:00:41even have a computer, interacting with computers was mostly text-only through terminal user

00:00:47interfaces through the command-line essentially. And rich graphical user interfaces and rich

00:00:54websites and all that fun stuff – operating systems built for normal users – that only

00:01:01became a thing around the 90s, 2000s and of course kept on evolving until today. And it

00:01:09still is of course evolving, don't get me wrong, I'm not saying that this is all going

00:01:14away, but one thing that is clearly visible and easy to see is that with AI agents we

00:01:22have this strong trend back to text-only input, terminal user interfaces, CLI tools, markdown,

00:01:31JSON, all that basic stuff. And with that I don't just mean that we have tools like

00:01:37Claude Code that don't really come with a graphical user interface – though there is

00:01:43a desktop app but it's primarily consumed as a command-line tool – I don't just mean

00:01:48that. Instead I mean that all these AI agents, these agentic tools – however you want to

00:01:54call them – really excel at interacting with other command-line tools, other programs they

00:02:02can invoke via the command-line, they want simple text, simple formatted text like markdown,

00:02:09that is where they really really shine. And that's why more and more companies – like

00:02:15for example a few hours ago when I'm recording this, Google released more and more command-line

00:02:21tools. Like Google released a Google Workspace CLI. Believe it or not, that didn't exist

00:02:27until now and it's a tool you can use to interact with your Google Workspace services like Gmail,

00:02:35Google Drive, through an official CLI. Now there already were other solutions – like

00:02:41for example GOG CLI by Peter Steinberger, the creator of OpenClaw – he built it because

00:02:48he wanted a programmatic CLI-based way of interacting with Google services and that didn't exist

00:02:54until a few hours ago but now it does exist. And this is not a sponsored video by Google

00:02:59or anything like that, it's just interesting to see that more and more companies that offer

00:03:04services are releasing tools like this. MCP servers would kind of be a similar thing,

00:03:11though MCPs in my opinion have various disadvantages and I strongly believe we'll see CLI tools

00:03:18and APIs and in the end CLI tools just wrap APIs in the future or as the main way of consuming

00:03:27services through agentic tools. And here's a concrete example for what I mean. Over the

00:03:32last couple of days and weeks I've been playing around with the PyCoding agent. Now

00:03:37the PyCoding agent is, you could say, an alternative to codex clot code. It's simpler in a good

00:03:46way, more limited regarding its features but very very powerful and you can use it with

00:03:51your codex subscription for example. Now this video is not primarily about this agent and

00:03:57it doesn't matter really if you use this or codex or cursor or whatever, they all can

00:04:01get you there. But I also like this tool a lot and most importantly, just like clot code

00:04:07and codex you can also use this tool for non-coding tasks despite its name. For example, it's

00:04:13actually this Py agent that's being used internally by OpenClaw. So that's the heart,

00:04:19the logical heart of OpenClaw you could say. And then OpenClaw of course added way more

00:04:24to it like memory and channels like Telegram and WhatsApp and all of that fun stuff. But

00:04:30this is one agentic tool you could be running on your system to do stuff. You could also

00:04:35build your own agent of course. And I got a course on that where I also cover how AI

00:04:40agents actually work and what the difference to workflows is and often you maybe want a

00:04:44workflow and not a true agent. But I got a course on that if you want to dive a bit deeper

00:04:49into that. I also got courses on clot code and codex if you want to learn more about these

00:04:54tools. But no matter which tool you're using, what's really really interesting and clear

00:04:58to see is how well they can interact with other CLI tools. Which makes a lot of sense

00:05:03because they've seen plenty of CLI work of using CLI tools like curl, like our command

00:05:10line commands like cd, ls, you know all these Linux commands. They've seen plenty of that

00:05:16in their training data. And they've not just seen standard Linux commands which they therefore

00:05:21know by heart. But most importantly they saw how to use these tools. How to chain CLI tools

00:05:28together. How to pipe results from one tool into another tool. They saw all of that and

00:05:35they excel at that. They also saw that they can use --help to learn more about a tool.

00:05:41And that puts them in a great position of using new tools as well. Tools they haven't seen

00:05:47in their training data like this new Google Workspace CLI for example.

00:05:52Of course if you want to use that through an agent it didn't see that in its training data.

00:05:57It doesn't know how to use that. But if you point it at it, if you maybe give it a link

00:06:01to the official docs, but even if you don't do that it will most likely be able to figure

00:06:05out how to use it by using --help and going from there. Because it's just yet another CLI

00:06:11tool. And large language models in the end excel at understanding and describing and using

00:06:17these CLI tools.

00:06:20And for example the other day, yesterday actually, I had a little problem. I needed to upload

00:06:26a PDF document to a website. And you know these sites that want you to upload a bunch of documents

00:06:32all in one document and that document must not exceed 5 megabytes in size? Yeah, I was

00:06:38on one of those sites. And naturally I had to compress that PDF document.

00:06:43Now I could have tried to find a website that does it for me. I'm not a huge fan of uploading

00:06:49my stuff to some random website though. So yeah, not sure. I could have also checked if

00:06:55there is some tool in my system that can help me with that. But I don't have the Adobe subscription

00:07:01anymore so I would have taken some research. And maybe in the end I would have uploaded

00:07:07it to some shaky website. Well, not with AI. Of course, I could have used Codex or Clot

00:07:13or Cursor or whatever to vibe code a little conversion/compression tool. That probably

00:07:19would have worked as well. But what I did, I span up the PI agent which I'm using with

00:07:26my Codex subscription. And I just ask it to take a look at that PDF file and please compress

00:07:33it while maintaining quality as much as possible.

00:07:36That was all. That was my only prompt here. And it essentially went to work, executed

00:07:41a bunch of commands in the command line, ran some little scripts. And by the way, I'm running

00:07:46this on my system, but I installed a guardrails extension. PI or PI is the actual name has

00:07:53this concept of extensions which you can install. So I installed an extension that prevents the

00:07:59agent from just erasing my hard drive, at least in the straightforward way. And I was also

00:08:06closely watching it whilst it described what it was about to do. So I let it do its thing

00:08:11and it ran a bunch of stuff here. And in the end it was done. And indeed it did successfully

00:08:18compress this document and made it significantly smaller. Now that's just a simple example maybe.

00:08:25And there would have been alternatives. My point just is it did that all in the command

00:08:29line in the terminal in the end by using our commands, our programs there. And of course,

00:08:36that all makes a lot of sense because we're talking about programs using a computer. And

00:08:41all these graphical user interfaces and rich websites were built for humans, for you and

00:08:46me. And that won't go away of course. But if we want to have little utility tools, AI agents

00:08:53running on our system that can at least do some of the tasks we are doing right now, then

00:08:59we need to give them a way of using the computer in a more efficient way. Because a graphical

00:09:03user interface, an app or a website built for a human is not the ideal way for a computer

00:09:09program of course. It would have to take a screenshot, figure out where the buttons are,

00:09:13move the mouse to a button, click that button, take a screenshot again to see what's on the

00:09:18new page. That's super inefficient, burns a lot of tokens and takes super long. And I mean,

00:09:24that's why we have the concept of APIs way before the advent of AI agents and large language

00:09:31models. Because if we are writing a program, doesn't matter if it's a website or an app.

00:09:37If we're writing a program and we want to interact with another program, with another service,

00:09:43of course in the past we already used an API and we didn't try to write a script that uses

00:09:49a website that's meant to be used by humans. That's why APIs exist and CLIs, command line

00:09:56programs, in the end are just wrappers around APIs, at least in the case of CLIs like the

00:10:03Google Workspace CLI. But that is exactly the kind of program we need and want for an agent

00:10:10to consume because it doesn't care about pretty buttons or anything about that. It wants a

00:10:15simple way of invoking various commands to get stuff done. And that is why this makes

00:10:22sense. That's also of course why we have markdown being more important now than ever and why

00:10:28most documentation pages already offer a little copy button like this, which makes it easy

00:10:32to copy the content as markdown so that you can paste it into your favorite large language

00:10:38or chat session or coding tool. Why some websites also support stuff like adding .md at the end

00:10:46of the URL to get this article in markdown because we're going towards a future where

00:10:52at least some services and some content will primarily be meant to be consumed by agents.

00:10:58I mean, take the documentation of a library or a framework like TanStack Start. If you're

00:11:03building a TanStack Start site these days, and of course doesn't matter which tech stack

00:11:09you use, you get my point, then you will likely do that with help of some coding agent, cursor,

00:11:15whatever. And if you want to tell those agents how to use the library, if you want to point

00:11:20them at a specific documentation article, you don't want to point them at a website like

00:11:25this. You don't want them to download the HTML code, which burns a lot of tokens unnecessarily.

00:11:32And that is kind of the same reason or the same reason for why CLI tools are becoming

00:11:38more and more important because we're moving towards a future where at least some tasks

00:11:42will be done with the help of AI agents or exclusively by AI agents. Which of course also

00:11:49means that if you are building some kind of service which is not primarily meant to be

00:11:54consumed by humans, you wanna strongly think about building a CLI as well as offering an

00:12:02API and whatever you need so that in the future, people can consume your service through agents.

00:12:09And of course, we're still super early here. The vast majority of people doesn't care about

00:12:14agents at all. And it's too early to tell how good AI agents will become and which kind of

00:12:20tasks they will be able to tackle in the future. Maybe we are kind of stuck at the current level

00:12:26where they can do some stuff, but definitely not all of that and still need human supervision.

00:12:31But even in that place, there are tasks that can be performed by agents and you can make

00:12:37them more useful and more powerful by giving them just the right tools that make it easy

00:12:42for agents to interact with our services, with websites and so on. And that's why we're kind

00:12:49of going full circle. Obviously, that does not mean that the graphical user interface

00:12:55and websites are going away and there will probably always be apps or websites that are

00:13:01meant to be consumed by humans that don't really make sense to be consumed by agents. I mean,

00:13:07something like Netflix. I don't see a huge advantage in an agent telling me what a certain

00:13:13movie is about. I guess I want to watch it. But for many services, especially in the SaaS

00:13:21business or in the professional services area, that definitely is the way forward. I think

00:13:28obviously early days, but definitely a clear development we can see here. At least that

00:13:34is my opinion. But as always, I want to find out what your opinion is, too. So please share

00:13:39it. Let me know what you think of that, what I maybe forgot or overlooked. And yeah, let's

00:13:44see how the world of CLI tools looks like in a year or two.

Key Takeaway

The rise of AI agents is shifting the computing paradigm back to command-line interfaces and structured text, as these formats are far more efficient for machine interaction than traditional graphical user interfaces.

Highlights

AI agents are driving a "full circle" return to text-based interfaces like CLIs and Markdown because they are more efficient for programmatic consumption than GUIs.
Large Language Models (LLMs) excel at using terminal tools because they can understand documentation, chain commands, and use "--help" to learn new tools on the fly.
Major companies like Google are releasing official CLI tools (e.g., Google Workspace CLI) to cater to the growing demand for agentic automation.
Using CLIs for AI agents significantly reduces token consumption and latency compared to "computer use" methods like taking screenshots and clicking buttons.
The speaker predicts that future software services must offer CLIs and APIs to remain relevant in an ecosystem where agents perform professional tasks.
Markdown and simple text formats are becoming the primary way documentation is consumed, with some sites even offering ".md" URL extensions for AI-readability.

Timeline

The Evolution of User Interfaces and the Return to Text

The speaker explores the history of computing, noting a shift from the text-heavy terminals of the 1970s to the rich graphical user interfaces (GUIs) of the 90s and 2000s. While GUIs were designed to make computers accessible to humans, the emergence of AI agents is creating a "full circle" trend back toward text-only inputs. This shift involves a renewed focus on terminal user interfaces, Markdown, and JSON formats. The speaker argues that while GUIs aren't disappearing, they are no longer the primary interface for the newest generation of digital workers. This section sets the stage for why "basic" formats are regaining technical dominance.

Why AI Agents Excel at Using Command-Line Tools

AI agents like Claude Code and the Google Workspace CLI are highlighted as prime examples of tools designed for programmatic interaction. The speaker points out that agents excel at invoking commands and processing simple formatted text because it aligns with their internal logic. A major recent development is Google's release of an official Workspace CLI, which allows agents to manage Gmail and Drive without a human-centric interface. Previously, users had to rely on community-made solutions like the GOG CLI to achieve this level of automation. This trend suggests that companies are now officially recognizing agents as a valid and important user segment.

Agentic Capabilities: Chaining Commands and Learning on the Fly

The speaker introduces the PyCoding agent, an alternative to Claude Code used within the OpenClaw project, to demonstrate agentic power. Large Language Models have been trained on vast amounts of Linux documentation, making them naturally proficient at using commands like 'curl', 'ls', and 'cd'. They are particularly good at 'piping' results from one tool to another, effectively chaining complex workflows together. Even when faced with a tool not in their training data, agents can use the '--help' flag or documentation links to learn the syntax. This flexibility makes the CLI an incredibly robust bridge between AI reasoning and system execution.

Real-World Example: Compressing PDFs via CLI Agent

A practical anecdote is shared regarding the struggle of compressing a PDF to meet a 5MB website upload limit without using shady third-party sites. Instead of using a GUI like Adobe, the speaker used the Py agent with a simple natural language prompt to compress the file locally. The agent autonomously identified the necessary CLI tools and scripts to execute the task while the speaker monitored the process through a 'guardrails' extension. This extension is crucial as it prevents the agent from accidentally performing destructive actions like erasing the hard drive. The success of this task illustrates how agents can replace specialized software subscriptions with general-purpose CLI proficiency.

The Inefficiency of GUIs for Machines vs. Humans

This section explains the technical debt associated with agents using human-centric interfaces. While a human enjoys buttons and layouts, an agent must take screenshots, calculate click coordinates, and wait for visual refreshes, which 'burns a lot of tokens' and is 'super inefficient.' APIs and CLIs solve this by providing a direct, low-latency path for the agent to communicate with the software. The speaker emphasizes that CLIs are essentially wrappers around APIs that make service consumption seamless for agentic workflows. Ultimately, the 'pretty' elements of modern web design are barriers, not features, for an autonomous program.

The Future of Documentation and Service Design

Looking forward, the speaker predicts that services not optimized for agents will struggle to gain traction in professional sectors. Documentation for libraries like TanStack Start is already being optimized with 'copy as Markdown' buttons to facilitate easy integration into LLM chat sessions. Developers are encouraged to build CLIs alongside their APIs to ensure their services are 'agent-ready.' While some consumer services like Netflix will remain human-focused, the SaaS and professional industries are moving toward an agent-first architecture. The video concludes by inviting viewers to share their thoughts on how the CLI landscape will evolve over the next few years.

Community Posts

Write about this video