Terminal Computer Use

Gemini 3 topped terminal bench with a simple harness, not an agent framework.

Excitement shifting from MCP (often just context window cruft) to terminal computer use as a more durable harness.

GUI computer use is a holy grail - drop-in knowledge worker, any task possible at the KVM layer. But it's slow, inefficient, doesn't work well yet. The bash shell is powerful, text-native, ready now, and getting RL attention. The ingredients:

  1. Give every agent run an associated container runtime - several companies now offering solutions + the labs' own code execution features
  2. Give said agent a set of general tools: read file, edit file, and run bash command
  3. Use an existing agent SDK or CLI as the "inner loop" to manage concurrency, subagents, etc
  4. Fill the container with great scripts and binaries, READMEs on how to use them (i.e. Anthropic Skills concept)
  5. Still provide a few focused tools/MCP for mission critical things (e.g. send an email, HITL, some DB operation with guardrails etc)
  6. Let the agent go wild
    • Chaining posix tools like awk, grep, sed etc.
    • Leveraging arbitrary packages and libraries (e.g. ffmpeg, whisper) - python libraries
    • Act as a coding agent to generate code just in time for certain tasks

Agent-directed loop + terminal possibilities + coding agent capabilities = broad coverage. Models have enormous skill collections in distribution that this unlocks.

I'm skeptical of stripped-down task-specific models. I want models that reason about my domain while having general skills. This approach fits my "big model" bias.