AIE Code Debrief

Takeaways from AIE Code last week. Thanks @swyx and team.

Terminal Computer Use

See the full writeup: Terminal Computer Use

Just use our Agent SDK/CLI, don't build your own loops

Labs perform RL on their harness and models together - it's a package deal. A lot of this you don't want to build yourself anyway. Considering adopting despite my own framework adventures.

Context pollution to be avoided

Consensus: don't use more than 50% of your context window. Keep it clean. Implications:

  1. Use subagents for focused work without polluting the main window
  2. Start over when off track - don't try to steer back
  3. Do research, store it, then leverage in fresh context

My approach: have the agent review git log/diffs for related work, persist the summary, reset context, then @ the resource to begin. For adversarial verification, reset context and review fresh against the changelog.

RL Sequencing and RL Environments

Teams working on orchestrating RL environments - keeping GPUs busy while staying on-policy with variable rollout times.

Cline's talk: all agent improvements come from RL and environment setup, not clever tricks.

Prime Intellect: "Environments are the web apps of the age."

Vibe coding and vibe engineering: needs back-pressure + curation

Teams investing in 360 testing and composable codebases are getting results. Replit's batteries-included testing story impressed - unit tests to Playwright to computer use.

Core idea: lack of 9's propagates through longer trajectories. Verify rigorously.

Bonus: trade compute for parallel execution - multiple trajectories, pick the best. Powerful in vibe coding without a developer steering.

AI code review is popular

Lots of code review companies present. I've been asking my coding agent to review changes, but these startups may have harness secret sauce worth investigating. Graphite (NYC), CodeRabbit, qodo, greptile.

Miscellaneous / Try list

  • Cursor Composer 1 - less capable than frontier but speed enables different flow state
  • Proactive agents - Jules crawls your codebase and suggests work
  • Tacit code sharing - point to existing code as few-shot examples instead of packages
  • Gimlet Labs - coding agents generating and testing kernel fusion ideas