Copilot Workspace and the birth of Task-Oriented Programming

In 2023 we at GitHub Next invented an early form of task-oriented programming in a system called Copilot Workspace.

Copilot Workspace was the world’s first implementation of human-guided, task-oriented software development. It was the first interactive, structured AI-for-Code experience with the Task –> Specification –> Plan –> Code pathway. It had flaws, which I’ll mention below.

We ran Copilot Workspace as a web-based demonstrator system from late 2023 to early 2025. We then pulled it down in early 2025 in favour of Agent Mode in VS Code.

The GitHub Next view of the project is best seen in the March 2024 Project page.

I gave a recap talk of Copilot Workspace in January 2025, the slides are below. A similar version of this talk was given at KCL mid 2024.

When it appeared on stage at Universe in late 2023 it was jaw-dropping and later Cole Bemis gave a recap of the journey from Concept to Code. Copilot Workspace had immediate impact on the AI-for-Code space. The only other industry coding agent Devin later took a pivot to much more interactive experiences, and Cursor Composer appeared in 2024 and incorporated ideas from Copilot Workspace. Vibe Coding was born, and software development was fundamentally changed forever.

The very earliest form of task-oriented programming we did at GitHub Next was a demonstrator of Extract, Edit, Apply – a design pattern for AI, which I initiated. Jonathan Carter then decided to make a larger project combining several initiatives towards a web-based IDE. Web-based IDEs projects come and go like the summer, and I always saw Copilot Workspace as a concept demonstrator of ideas that belonged in traditional IDEs or some parts which could be integrated into GitHub.com. In the end both things happened.

This was an incredibly fun project to work on and it had huge influence on all who saw it. Satya Nadella regularly referred to it to emphasise that task-oriented productivity activities would be at the heart of Microsoft and GitHub’s AI offerings.

Looking back, and in hindsight, we made some mistakes in the technical features. However we also got a LOT of things right for the time. One thing was scaling: Copilot Workspace operated successfully over truly enormous repositories without relying on indexing. That was because we did early forms of agentic exploration of the content of the repository, relying on high-powered models to interpret the information they were seeing. No one else was doing it at the time, and in all our evals it was much superior, and gave relatively quick results. Our aim with Copilot Workspace was to deliver first spec/plan results to the user in under ~30seconds, which, given the models and techniques of the day, was incredibly challenging. The system also doing sparse cloning of a slice of the repository, a surprisingly effective technique that is little known.

Copilot Workspace’s biggest technical downfall was the lack of a solid code-validation story for the generated code. Being a web-based editor, the server of course couldn’t build or test the code, and we had no “local dev” setup available. We decided to rely on an ephemeral GitHub Codespace for backing compute when needed. The Codespace was, however, relatively slow to start (unless prebuilds were configured), and not solid enough to always assume it was there. THis meant we never integrated the build feedback into the core AI logic for spec, plan and code. This in turn meant that the crucial build/repair loops of modern AI coding agents were not properly implemented and integrated into the flow, and Task –> Spec –> Plan –> Code ended at Code without any real iteration by the AI.

Another technical flaw, in retrospect, was that the we didn’t embrace chat as both the output of the coding agent (e.g. the place to show its spec and plan) and the place to give guidance. We were too reluctant to embrace chat-to-code. We had a strong emphasis that the plan must be editable and invested a lot in the UX and controls to do this. In practice, modern vibe coding systems use much simpler chat-log UX that feels less structured and less laboured, making more efficient use of the developer’s concentration while keeping them “in the flow”. We had a good flow, but not as good as you see today.

Another flaw was the over-emphasis on using GItHub Issues as a source of tasks. This seems so natural for a GItHub-oriented demonstrator, and revolutionary at the time, but the truth is that issues are often low-quality task specifications, requiring some elaboration and negotiation.

Finally, Cursor Composer added two crucial things to this kind of flow, to really make vibe-coding fire. First they moved task-orientation into a more traditional IDE setting (VSCode clone). Second they used the local context of the file the developer is looking at and the selection or cursor position to help localize a task. The first point is obviously needed and we always knew that. The second point is magic, as it means the task orientation can be for either very local tasks or very global ones.

Still, despite its flaws, Copilot Workspace was conceptually groundbreaking and we supported 10,000s of users through late 2024. A brief effort was made to make it a much more widespread adopted product, which we delivered on. However the gap between that and a long-term well-integrated product is large, and Cursor set a blinding pace. By early 2025 it was appropriate to let VS Code Agent Mode, GitHub Spark, Copilot Coding Agent and other initiatives take over the mantle of task-oriented programming at GitHub, which we had also influenced both directly and indirectly.

I’m somewhat frustrated that the pressures of developing this demonstrator meant we didn’t write a paper or technical report on it, and also didn’t fully benchmark the underlying AI logic. I did write a small section on the origins ot the project as of April 2024. Equally, the field was so incredibly fast moving at this time that pausing to do so would have meant pausing the project for months. I’m incredibly grateful for the chance to make a small contribution to the development of task-oriented programming.

Leave a comment