Terminal Computer Use
Gemini 3 topped terminal bench with a simple harness, not an agent framework.
Excitement shifting from MCP (often just context window cruft) to terminal computer use as a more durable harness.
GUI computer use is a holy grail - drop-in knowledge worker, any task possible at the KVM layer. But it's slow, inefficient, doesn't work well yet. The bash shell is powerful, text-native, ready now, and getting RL attention. The ingredients:
- Give every agent run an associated container runtime - several companies now offering solutions + the labs' own code execution features
- Give said agent a set of general tools: read file, edit file, and run bash command
- Use an existing agent SDK or CLI as the "inner loop" to manage concurrency, subagents, etc
- Fill the container with great scripts and binaries, READMEs on how to use them (i.e. Anthropic Skills concept)
- Still provide a few focused tools/MCP for mission critical things (e.g. send an email, HITL, some DB operation with guardrails etc)
- Let the agent go wild
- Chaining posix tools like awk, grep, sed etc.
- Leveraging arbitrary packages and libraries (e.g. ffmpeg, whisper) - python libraries
- Act as a coding agent to generate code just in time for certain tasks
Agent-directed loop + terminal possibilities + coding agent capabilities = broad coverage. Models have enormous skill collections in distribution that this unlocks.
I'm skeptical of stripped-down task-specific models. I want models that reason about my domain while having general skills. This approach fits my "big model" bias.
