Field report · Microsoft Foundry Hosted Agents

Building an artifact-producing agent was possible, but not smooth.

This project built a CLI-driven web app generator: the CLI sends a prompt to a Foundry hosted agent, the agent runs GitHub Copilot SDK inside a custom container, generates a static frontend app, packages it, and returns output/app.zip through hosted-session files.

RuntimeHosted Agent
Agent CoreCopilot SDK
ArtifactStatic ZIP
Main RiskHidden Limits

Start here

Concise issue list

  • 1Too many overlapping primitives. Responses vs invocations, custom agents vs hosted agents, base64 ZIP vs session files all looked plausible for the same scenario.
  • 2Artifact return path has hidden ceilings. SSE idle timeout, session file download size, and Responses payload size each forced a late workaround.
  • 3Region availability was easy to miss. The first Foundry resource was created before discovering hosted agents were only available in a specific region, forcing resource recreation.
  • 4Container plumbing leaked into product work. Dockerfile, ACR connection, Bicep, managed identity, and region availability dominated early implementation time.
  • 5Observability was not default-on enough. Server request logs and session filesystem state were hard to inspect until explicit monitor commands and custom logging were added.
  • 6Skills and auth needed custom design. The Web App Builder skill had to be vendored into the image, while GitHub App OAuth token forwarding had no ready template.
  • 7No server-side conversation history for hosted agents. Verified 2026-04-26: store=true, conversation_id, and previous_response_id are silently ignored on the hosted-agent endpoint, and the project-level /protocols/openai/responses path returns 404 for hosted agents. Hosted-agent containers and managed conversation history are mutually exclusive surfaces in Foundry today — picking custom container means DIY history (we ship local JSONL).

Scenario comparison

Foundry Hosted Agents vs Anthropic Managed Agents

Concern Foundry path used here Anthropic Managed Agents fit Verdict
Durable work state Hosted session VM and filesystem persist across turns, but state is implicit. Session event log and wake/resume model are first-class and queryable. Works, less explicit
Streaming Raw SSE through proxy; required a 15s keepalive to survive long tool calls. Streaming is part of the managed agent abstraction. Foundry footgun
Artifact download Agent writes output/app.zip; CLI downloads from session files; ~4 MiB ceiling was hit. Native Files API for generated artifacts. Direct gap
Skills Vendored into the image and wired through Copilot SDK skillDirectories. Native skills configuration on the agent. More manual
Conversation history Not available for hosted agents. store, conversation_id, and previous_response_id are silently ignored; project-level Responses returns 404. Workaround: client-side JSONL under ~/.web-app-gen/history/. Durable append-only event log on every session; messages.list(session_id) returns full history. Direct gap
Container control Full custom container, arbitrary tools, enterprise Azure wiring. No custom runtime image to own. Foundry advantage
Enterprise fit Azure tenancy, managed identity, Key Vault, private networking are natural. Simpler platform surface, but less Azure-native control. Foundry advantage

What was built

Project architecture

The generated web apps are frontend-only and do not call Foundry. Foundry is only the generator backend: a hosted session receives the prompt, runs the Copilot SDK, writes files into the session sandbox, and repackages the result for download.

CLI / CI / Product UI
  ├─ create or resume Foundry hosted session
  ├─ send prompt through Responses
  ├─ stream progress from hosted agent
  └─ download output/app.zip from session files

Control Plane
  ├─ GitHub App OAuth for product users
  ├─ token broker and session metadata
  └─ Foundry adapter boundaries for tests

Foundry Hosted Agent Container
  ├─ Node/TypeScript Responses-compatible HTTP server
  ├─ @github/copilot-sdk with per-user gitHubToken
  ├─ vendored Web App Builder skill directory
  ├─ output/app static files
  └─ output/app.zip artifact package
End-to-end flow from CLI prompt through Foundry hosted session, Copilot SDK generation, output packaging, session file download, and validation.
End-to-end flow: the CLI sends prompts through Responses, the hosted agent runs Copilot SDK with the Web App Builder skill, writes output/app/, packages output/app.zip, and returns it through session files.
agent/

Hosted runtime

Responses handler, streaming, permission guard, Copilot SDK runner, output validation, and ZIP packaging.

cli/

Generator client

Foundry REST client, session commands, download flow, preview server, and static app validation.

control-plane/

Product boundary

GitHub OAuth, token storage abstraction, product session records, and Foundry handoff contracts.

Developer experience

Where the team struggled

Design
Choosing the right Foundry primitive took multiple rounds.

Agents repeatedly confused invocations with Responses and hosted agents with fully managed agents. The right answer became Responses + Hosted Sessions + session files, but that path was not obvious up front.

Deploy
Hosted-agent region support was discovered too late.

A Foundry resource was created first, then recreated in the hosted-agents-supported region after learning the feature was region-limited. This is small friction, but it breaks the expected “create resource, then add agent” flow.

Deploy
Custom container deployment required manual Azure glue.

ACR connection, AcrPull role assignment, preview region constraints, and azd path quirks had to be discovered and patched.

Secrets
Token forwarding and secret refs were opaque.

GitHub App OAuth is the right product attribution model, but Foundry samples did not provide a direct user-token-to-hosted-agent pattern.

Runtime
SSE idle timeout broke long tool calls.

The CLI saw terminated streams while the server kept working. A heartbeat fixed it, but the timeout was undocumented and surfaced as a transport failure.

Artifacts
A ~4 MiB session file download limit appeared late.

Large output/app.zip downloads failed around 4,194,292 bytes. Deflate compression mitigated the current app, but large generated assets will need a real large-file primitive.

History
Hosted agents have no server-side conversation history.

A 30-minute spike confirmed it. store=true, conversation_id, and previous_response_id are silently ignored on the hosted-agent endpoint; the project-level Responses path returns 404 for hosted agents (it only routes to prompt agents). Foundry forces a choice: managed history (prompt agents, no container) or custom container (hosted agents, no history). We ship local JSONL under ~/.web-app-gen/history/ as the workaround — it works, but doesn't roam across machines.

Concrete asks

What Foundry could improve

Gap Developer-visible symptom Suggested fix
Artifact-producing guidance Multiple valid-looking designs across Responses, invocations, base64, files, and URLs. Publish a decision guide: “Use Responses + Hosted Sessions + /files for artifact-producing agents.”
SSE timeout Long tool calls terminate the stream with no clear app-level error. Own keepalive in the hosting adapter and document proxy idle behavior.
File download limit Downloads fail or truncate near 4 MiB. Return HTTP 413 with a clear message, raise the limit, or provide chunked/pre-signed downloads.
Observability Request-handler issues require redeploying extra console.log statements. Make handler stdout/stderr, session files, and idle lifecycle visible in monitor output.
Region availability Developer creates a Foundry resource, then learns hosted agents require a different region and must recreate it. Validate region support at resource creation time and surface hosted-agent availability in the portal, CLI, and templates.
Container onboarding ACR, roles, Bicep, region support, and generated paths needed manual repair. Ship a working custom-container template that auto-wires registry access and validates region support.
Skills Skill loading is image-specific and path-specific. Add a native skill configuration field, or document the Copilot SDK skill-directory pattern explicitly.
Conversation history for hosted agents No way to get server-side history for custom-container agents. store, conversation_id, and previous_response_id are silently ignored on the only endpoint that accepts hosted-agent traffic. Either route hosted-agent Responses through the same conversation store that prompt agents use, or document clearly that history is application-owned for hosted agents and ship a sample.

Reference path

Production deployment steps

The production path is doable, but it is more complicated than the product scenario deserves. A simple artifact-producing agent currently requires resource planning, container infrastructure, identity wiring, hosted-agent schema work, app-level token brokering, and custom artifact handling.

1 · Region
Create the Foundry project in a hosted-agents-supported region.

Verify region support before creating resources. In this project, the first resource had to be recreated after discovering hosted agents were only available in a specific region.

2 · Image
Build the hosted-agent custom container.

Package the Node/TypeScript Responses server, @github/copilot-sdk, the Web App Builder skill directory, readiness/health endpoints, and output packaging logic into the image.

3 · Registry
Push the image to Azure Container Registry and connect it to Foundry.

Create or reuse ACR, wire the Foundry project connection, and grant AcrPull to the project managed identity so hosted sessions can start the container.

4 · Agent
Deploy the hosted-agent definition.

Keep agent.yaml, azure.yaml, and Bicep aligned with the current hosted-agent preview schema, including env vars, port mapping, readiness probe, and session configuration.

5 · Auth
Wire product-user GitHub App OAuth.

Each user signs in through the GitHub App; the control plane refreshes and forwards a time-bounded user token to the hosted agent so Copilot SDK usage is attributed to the licensed user.

6 · Invoke
Call the hosted session through Responses.

The CLI or product UI creates/resumes a Foundry hosted session, sends the generation prompt, streams progress, and relies on session workspace persistence for multi-turn edits.

7 · Files
Package and download the generated app.

The agent writes output/app/ and repackages output/app.zip on every turn. The CLI downloads via session files, with compression required to stay under the observed ~4 MiB download ceiling.

8 · Ops
Add monitoring and failure diagnostics.

Use server logs, session IDs, explicit SSE keepalives, request IDs, and artifact-size checks because the default failure modes can look like stream termination, HTTP 500, or missing files.

Why this feels too complicated

Most of these steps are infrastructure ceremony around a narrow product goal: run an agent for a while and return a ZIP. A paved Foundry template should collapse region validation, ACR wiring, hosted-agent schema, streaming keepalive, skills loading, and large-artifact return into one opinionated deployment path.

Bottom line

Foundry is powerful; the artifact path needs productization.

Foundry is a strong fit when Azure control, custom containers, managed identity, and enterprise networking are mandatory. For this specific “long-running agent produces a downloadable artifact” scenario, Anthropic Managed Agents currently provide a more obvious happy path: fewer primitives, first-class files, native skills, and an event stream that doubles as the operational log.

The Foundry ask is not more primitives. It is a paved road: opinionated docs, default keepalive, visible logs, deploy-time validation, and a large-artifact return primitive.

Report generated from the implementation notes and task-session retrospective for web-app-gen-in-foundry.