Field report · Microsoft Foundry Hosted Agents
Building an artifact-producing agent was possible, but not smooth.
This project built a CLI-driven web app generator: the CLI sends a prompt to a Foundry hosted agent,
the agent runs GitHub Copilot SDK inside a custom container, generates a static frontend app, packages it,
and returns output/app.zip through hosted-session files.
Start here
Concise issue list
- 1Too many overlapping primitives. Responses vs invocations, custom agents vs hosted agents, base64 ZIP vs session files all looked plausible for the same scenario.
- 2Artifact return path has hidden ceilings. SSE idle timeout, session file download size, and Responses payload size each forced a late workaround.
- 3Region availability was easy to miss. The first Foundry resource was created before discovering hosted agents were only available in a specific region, forcing resource recreation.
- 4Container plumbing leaked into product work. Dockerfile, ACR connection, Bicep, managed identity, and region availability dominated early implementation time.
- 5Observability was not default-on enough. Server request logs and session filesystem state were hard to inspect until explicit monitor commands and custom logging were added.
- 6Skills and auth needed custom design. The Web App Builder skill had to be vendored into the image, while GitHub App OAuth token forwarding had no ready template.
- 7No server-side conversation history for hosted agents. Verified 2026-04-26:
store=true,conversation_id, andprevious_response_idare silently ignored on the hosted-agent endpoint, and the project-level/protocols/openai/responsespath returns 404 for hosted agents. Hosted-agent containers and managed conversation history are mutually exclusive surfaces in Foundry today — picking custom container means DIY history (we ship local JSONL).
Scenario comparison
Foundry Hosted Agents vs Anthropic Managed Agents
| Concern | Foundry path used here | Anthropic Managed Agents fit | Verdict |
|---|---|---|---|
| Durable work state | Hosted session VM and filesystem persist across turns, but state is implicit. | Session event log and wake/resume model are first-class and queryable. | Works, less explicit |
| Streaming | Raw SSE through proxy; required a 15s keepalive to survive long tool calls. | Streaming is part of the managed agent abstraction. | Foundry footgun |
| Artifact download | Agent writes output/app.zip; CLI downloads from session files; ~4 MiB ceiling was hit. |
Native Files API for generated artifacts. | Direct gap |
| Skills | Vendored into the image and wired through Copilot SDK skillDirectories. |
Native skills configuration on the agent. | More manual |
| Conversation history | Not available for hosted agents. store, conversation_id, and previous_response_id are silently ignored; project-level Responses returns 404. Workaround: client-side JSONL under ~/.web-app-gen/history/. |
Durable append-only event log on every session; messages.list(session_id) returns full history. |
Direct gap |
| Container control | Full custom container, arbitrary tools, enterprise Azure wiring. | No custom runtime image to own. | Foundry advantage |
| Enterprise fit | Azure tenancy, managed identity, Key Vault, private networking are natural. | Simpler platform surface, but less Azure-native control. | Foundry advantage |
What was built
Project architecture
The generated web apps are frontend-only and do not call Foundry. Foundry is only the generator backend: a hosted session receives the prompt, runs the Copilot SDK, writes files into the session sandbox, and repackages the result for download.
CLI / CI / Product UI ├─ create or resume Foundry hosted session ├─ send prompt through Responses ├─ stream progress from hosted agent └─ download output/app.zip from session files Control Plane ├─ GitHub App OAuth for product users ├─ token broker and session metadata └─ Foundry adapter boundaries for tests Foundry Hosted Agent Container ├─ Node/TypeScript Responses-compatible HTTP server ├─ @github/copilot-sdk with per-user gitHubToken ├─ vendored Web App Builder skill directory ├─ output/app static files └─ output/app.zip artifact package
output/app/, packages output/app.zip, and returns it through session files.
Hosted runtime
Responses handler, streaming, permission guard, Copilot SDK runner, output validation, and ZIP packaging.
Generator client
Foundry REST client, session commands, download flow, preview server, and static app validation.
Product boundary
GitHub OAuth, token storage abstraction, product session records, and Foundry handoff contracts.
Developer experience
Where the team struggled
Agents repeatedly confused invocations with Responses and hosted agents with fully managed agents. The right answer became Responses + Hosted Sessions + session files, but that path was not obvious up front.
A Foundry resource was created first, then recreated in the hosted-agents-supported region after learning the feature was region-limited. This is small friction, but it breaks the expected “create resource, then add agent” flow.
ACR connection, AcrPull role assignment, preview region constraints, and azd path quirks had to be discovered and patched.
GitHub App OAuth is the right product attribution model, but Foundry samples did not provide a direct user-token-to-hosted-agent pattern.
The CLI saw terminated streams while the server kept working. A heartbeat fixed it, but the timeout was undocumented and surfaced as a transport failure.
Large output/app.zip downloads failed around 4,194,292 bytes. Deflate compression mitigated the current app, but large generated assets will need a real large-file primitive.
A 30-minute spike confirmed it. store=true, conversation_id, and previous_response_id are silently ignored on the hosted-agent endpoint; the project-level Responses path returns 404 for hosted agents (it only routes to prompt agents). Foundry forces a choice: managed history (prompt agents, no container) or custom container (hosted agents, no history). We ship local JSONL under ~/.web-app-gen/history/ as the workaround — it works, but doesn't roam across machines.
Concrete asks
What Foundry could improve
| Gap | Developer-visible symptom | Suggested fix |
|---|---|---|
| Artifact-producing guidance | Multiple valid-looking designs across Responses, invocations, base64, files, and URLs. | Publish a decision guide: “Use Responses + Hosted Sessions + /files for artifact-producing agents.” |
| SSE timeout | Long tool calls terminate the stream with no clear app-level error. | Own keepalive in the hosting adapter and document proxy idle behavior. |
| File download limit | Downloads fail or truncate near 4 MiB. | Return HTTP 413 with a clear message, raise the limit, or provide chunked/pre-signed downloads. |
| Observability | Request-handler issues require redeploying extra console.log statements. |
Make handler stdout/stderr, session files, and idle lifecycle visible in monitor output. |
| Region availability | Developer creates a Foundry resource, then learns hosted agents require a different region and must recreate it. | Validate region support at resource creation time and surface hosted-agent availability in the portal, CLI, and templates. |
| Container onboarding | ACR, roles, Bicep, region support, and generated paths needed manual repair. | Ship a working custom-container template that auto-wires registry access and validates region support. |
| Skills | Skill loading is image-specific and path-specific. | Add a native skill configuration field, or document the Copilot SDK skill-directory pattern explicitly. |
| Conversation history for hosted agents | No way to get server-side history for custom-container agents. store, conversation_id, and previous_response_id are silently ignored on the only endpoint that accepts hosted-agent traffic. |
Either route hosted-agent Responses through the same conversation store that prompt agents use, or document clearly that history is application-owned for hosted agents and ship a sample. |
Reference path
Production deployment steps
The production path is doable, but it is more complicated than the product scenario deserves. A simple artifact-producing agent currently requires resource planning, container infrastructure, identity wiring, hosted-agent schema work, app-level token brokering, and custom artifact handling.
Verify region support before creating resources. In this project, the first resource had to be recreated after discovering hosted agents were only available in a specific region.
Package the Node/TypeScript Responses server, @github/copilot-sdk, the Web App Builder skill directory, readiness/health endpoints, and output packaging logic into the image.
Create or reuse ACR, wire the Foundry project connection, and grant AcrPull to the project managed identity so hosted sessions can start the container.
Keep agent.yaml, azure.yaml, and Bicep aligned with the current hosted-agent preview schema, including env vars, port mapping, readiness probe, and session configuration.
Each user signs in through the GitHub App; the control plane refreshes and forwards a time-bounded user token to the hosted agent so Copilot SDK usage is attributed to the licensed user.
The CLI or product UI creates/resumes a Foundry hosted session, sends the generation prompt, streams progress, and relies on session workspace persistence for multi-turn edits.
The agent writes output/app/ and repackages output/app.zip on every turn. The CLI downloads via session files, with compression required to stay under the observed ~4 MiB download ceiling.
Use server logs, session IDs, explicit SSE keepalives, request IDs, and artifact-size checks because the default failure modes can look like stream termination, HTTP 500, or missing files.
Most of these steps are infrastructure ceremony around a narrow product goal: run an agent for a while and return a ZIP. A paved Foundry template should collapse region validation, ACR wiring, hosted-agent schema, streaming keepalive, skills loading, and large-artifact return into one opinionated deployment path.
Bottom line
Foundry is powerful; the artifact path needs productization.
Foundry is a strong fit when Azure control, custom containers, managed identity, and enterprise networking are mandatory. For this specific “long-running agent produces a downloadable artifact” scenario, Anthropic Managed Agents currently provide a more obvious happy path: fewer primitives, first-class files, native skills, and an event stream that doubles as the operational log.
The Foundry ask is not more primitives. It is a paved road: opinionated docs, default keepalive, visible logs, deploy-time validation, and a large-artifact return primitive.
Report generated from the implementation notes and task-session retrospective for web-app-gen-in-foundry.