We have been running AI agents as full members of our engineering team for over a year. They write code, run tests, review pull requests, and deploy to production.
This paper shares what works and what does not: - Agents excel at repetitive refactoring and test writing - Code review agents catch 30% more issues than human-only review - Agents struggle with ambiguous requirements and design decisions - Test-driven workflows dramatically improve agent code quality - Human-agent pair programming produces better results than either alone
We provide concrete workflow designs for teams looking to integrate AI agents into their engineering processes.