Foundry — Project Recap (2-Week Window)

01 · Project Identity

What This Is, Right Now

Foundry is a multi-tenant agentic delivery platform that transforms requirements into working code. It structures delivery knowledge and powers AI agents that reason about full project context, decompose tasks, generate code in sandboxes, and push changes autonomously.

Elevator Pitch

Agencies and delivery teams feed in plans and conversations. Foundry decomposes them into structured requirements, reasons about implementation, provisions AI sandboxes, and ships code to repos.

Stage: Active development (v0.1.0). Solo founder build by Quintin Henry. Reference client: Burlington Medical (118 requirements, 8 skills, 7 workstreams). Engagement-type agnostic—platform migrations, greenfield builds, system integrations.

Stack

Frontend: Next.js 16 + React 19 + Tailwind 4.1

Backend: Convex (reactive BaaS, 55+ tables)

Auth: Clerk (multi-tenant orgs)

AI: Claude (Opus 4.6 / Sonnet 4.5)

Sandbox: Cloudflare Workers + Docker

Desktop: Tauri 2 (Rust + Vite)

Shape

268 Convex backend files

405 UI component files (34 domains)

562 Next.js app files (52 routes)

2,794 lines in schema.ts

02 · Architecture Snapshot

System as It Exists Today

Four-process distributed system in development, managed platform deployment in production. The diagram shows conceptual modules and data flow, not individual files.

flowchart TB subgraph Client["Browser / Desktop"] A["Next.js 16
52 routes, thin wrappers"] B["Tauri Desktop
Same @foundry/ui components"] end subgraph Backend["Convex Cloud"] C["Schema
55+ tables, 6 domains"] D["Server Functions
Queries, mutations, actions"] E["AI Actions
Claude API, context assembly"] F["Webhooks
GitHub, Atlassian, Clerk"] end subgraph AILayer["AI Inference"] G["Agent Worker
Hono + Anthropic SDK"] H["Analysis Routes
/analyze-requirement
/analyze-task-subtasks"] end subgraph Sandbox["Sandbox System"] I["Sandbox Worker
Durable Objects"] J["Docker Containers
Ephemeral Claude Code envs"] end subgraph External["External Services"] K["GitHub App
Repos, PRs, webhooks"] L["Clerk
Auth, orgs, JWT"] M["Claude API
3-tier model deployment"] end A -->|"WebSocket"| D B -->|"WebSocket"| D D --> C D --> E E --> G G --> H D -->|"HTTP"| I I --> J F --> D K --> F L --> A L --> D E --> M style Client fill:#eff6ff,stroke:#2563eb,color:#0f172a style Backend fill:#f8fafc,stroke:#3b82f6,color:#0f172a style AILayer fill:#fefce8,stroke:#d97706,color:#0f172a style Sandbox fill:#f0fdf4,stroke:#16a34a,color:#0f172a style External fill:#f1f5f9,stroke:#94a3b8,color:#0f172a

03 · Recent Activity

What Happened and Why

Six major workstreams in two weeks, grouped by theme. The dominant pattern: deepening AI integration and building the observability layer for agent-driven delivery.

Apr 3–4 · Codebase Analysis

AI-powered requirement analysis with review queue

3 new tables, 11 UI components, 2 agent worker routes. Runs Claude against connected GitHub repos to determine implementation status of each requirement. Human review queue for approve/reject. Results feed into the context assembly pipeline (Layer 7). The biggest feature of the window.

Apr 3 · PR #32

Agent Activity dashboard-first redesign

Replaced flat chronological list (scored C+ in audit) with dashboard metrics, grouped traces by requirement, inline expand, 6 audit trail sections. Added modelId tracking, instrumented 9 untracked AI operations.

Apr 2–3 · PR #31

Ubiquitous repo picker across tasks and workstreams

Added repositoryIds field to tasks and workstreams tables. RepoBadge component, RepoCreateModal, RepoPickerDropdown, settings page. GitHub repos now visible everywhere work is managed.

Apr 1 · PR #30

Design context pipeline with AI vision analysis

6 new tables. Drag-and-drop asset upload, Claude vision analysis, token parsing (JSON/CSS/SCSS), cascade resolution (program > workstream > requirement), sandbox injection, post-build visual diff scoring. Full test suite.

Mar 31 · PR #29

Task verification pipeline + service fixes

Verification results endpoint, graceful re-verify error handling, debounced connection loss banners. Closed HTTP handler ordering bug with Stripe route.

Apr 4 · Infrastructure

Biome linter + formatter with enforcement hooks

Replaced ESLint/Prettier with Biome. PostToolUse hook auto-fixes on every edit. Pre-commit hook blocks lint errors. Bulk format rollout with .git-blame-ignore-revs.

Apr 4 · Planning

Test coverage 90% initiative specced

Spec for bringing coverage from 28% to 90% using 4 parallel builder agents. Domain-clustered strategy, pre-commit gate, PostToolUse coverage warning hook.

Dominant Theme

AI Observability

Three of the four PRs (#30, #31, #32) add visibility into what AI agents are doing. The codebase analysis feature closes the loop: AI now analyzes its own implementation progress against requirements.

Supporting Theme

Quality Infrastructure

Biome enforcement, test coverage spec, audit trail instrumentation. The codebase is shifting from "build fast" to "build with guardrails."

Also Shipped

Google Drive import source (PR #27)
Mission Control consolidation (PR #26)
Service resilience phases 1–3 (PR #24)
Sprint usability updates (PR #25)
UX overhaul (PR #23)
Billing system (PR #22)

04 · Decision Log

Why Things Are the Way They Are

Key design decisions from this window. Extracted from commit messages and planning docs. This is the highest-value section for fighting cognitive debt.

Biome over ESLint + Prettier

Single tool for format + lint across all 6 workspaces. Enforced via PostToolUse hook (auto-fixes on every edit) and pre-commit hook (blocks errors).

Why: Two separate tools with overlapping config was causing rule conflicts. Biome is faster and has native Tailwind CSS directives support.

Dashboard-first for Agent Activity

Landing page is now health metrics (acceptance rate, velocity, token spend, coverage), not a chronological list. Trace drill-down groups executions by requirement.

Why: Usability audit scored the flat list at C+ (71/100). It served neither monitoring ("is everything healthy?") nor tracing ("what happened to REQ-042?"). Dashboard answers monitoring; grouped traces answer tracing.

AI analysis with human review queue

Codebase analysis runs Claude against repos, but results go through a review queue before updating requirement status. Batch approve/reject supported.

Why: Auto-approval is too risky at 90%+ accuracy—the 10% that's wrong could be costly. Review queue adds accountability and lets the team build confidence in the AI's judgment over time.

repositoryIds as arrays, not single values

Tasks and workstreams can reference multiple GitHub repositories. RepoBadge shows the primary, picker allows multi-select.

Why: Real-world delivery often spans multiple repos (frontend + backend + infra). Single repo binding was too constraining for the multi-repo reality of enterprise delivery.

Design context as cascading pipeline

Design tokens cascade program > workstream > requirement with merge semantics. Snapshots are immutable—created at task creation time.

Why: Sandboxes need frozen design context when they start, but the overall design evolves. Cascade gives inheritance; snapshots give stability. The immutable snapshot pattern prevents sandbox drift.

4-agent parallel strategy for test coverage

Domain-clustered agents: source-control+pipeline, discovery+audit, tasks+programs+skills, videos+sandbox+layout. Each agent writes ~30–50 test files.

Why: 150+ new test files is too much for a single session. Domain clustering avoids merge conflicts. Builder agents with bypassPermissions mode for speed.

05 · State of Things

Working, In Progress, Broken, Blocked

12

Shipped & Stable

PRs #21–#32 merged

3

In Progress

Active branches + specs

2

Degraded

Test coverage, large files

0

Blocked

No external blockers

Working & Stable

Core delivery pipeline (requirements, skills, tasks, workstreams)
Sandbox execution system (10-stage provisioning, Docker containers)
Agent Activity dashboard with audit trail
Design context pipeline with AI vision analysis
Repository picker across tasks and workstreams
Task verification pipeline
Google Drive import source
Service resilience layer (auto-reconnect, health monitoring)
Billing system (3 tiers)
Biome lint + format enforcement
GitHub App + Atlassian integrations
Clerk multi-tenant auth with row-level security

In Progress

Codebase analysis — on development, not yet merged to main
Semantic code search — on semantic-code-search branch, adds vector embeddings and cosine similarity search
Test coverage initiative — spec written, 4-agent strategy designed, not yet executed

Degraded

Test coverage at 28% — 153 of 261 source files in apps/web have zero tests. 46 tests total in packages/ui for 405 source files.
12 unmerged branches — accumulating stale feature branches that may need cleanup

06 · Mental Model Essentials

The 10 Things to Hold in Your Head

Key invariants, non-obvious coupling, and gotchas that will bite you if you forget them.

Every query must use .withIndex(), never .filter(). Convex filter causes full table scans and kills reactive performance. Define indexes in schema.ts for every query pattern.
Clerk wraps Convex, never the reverse. The Convex client needs the Clerk JWT. Breaking the provider nesting order breaks authentication silently.
All feature UI lives in packages/ui/, not apps/web/. Page files are 3–7 line wrappers. If you add logic to a page file, you break the shared component model with the desktop app.
Mutations cannot call Node.js APIs. Only actions can use Node.js APIs. Utility files shared between mutations and actions need separate entry points. If a shared util uses "use node", importing it from a mutation will fail.
assertOrgAccess() is mandatory on every query and mutation. Row-level security. Skip it and you get cross-tenant data leaks. Exception: health check endpoints must skip auth because they run before Clerk initializes.
params and searchParams are Promises in Next.js 16. Must await them. Also headers() and cookies() are async. Use the "skip" token on useQuery when auth state hasn't resolved.
Sandbox orchestrator is 4,142 lines with a formal state machine. The ALLOWED_TRANSITIONS map governs all lifecycle changes. Don't add transitions without updating the map—the system will silently reject them.
Webhooks follow the durable event buffer pattern. Store raw event → scheduler.runAfter(0) for async processing → return 200 OK immediately. Failed operations get exponential backoff retry (up to 5 attempts, 1h cap).
Design context cascades then snapshots. Program > workstream > requirement merge. Snapshots are immutable—created at task creation. Don't mutate a snapshot expecting sandboxes to pick up the change.
Never use purple/violet in UI. Design system rule. Blue/slate palette for AI features and interactive elements. Enforced by code review and Biome (informally).

07 · Cognitive Debt Hotspots

Where Understanding Is Weakest

Areas where the code changed faster than documentation and tests could follow. Each flagged with severity and a concrete action.

High

`convex/sandbox/orchestrator.ts` — 4,142 lines, 3 changes in 2 weeks

The largest file in the codebase. Contains the 10-stage sandbox provisioning state machine, session management, fleet orchestration, and auto-commit logic. Changed 3 times this window but has no inline documentation for the state machine transitions.

Action: Add a block comment at the top of orchestrator.ts documenting the 10 provisioning stages and the ALLOWED_TRANSITIONS map. Extract the state machine into a separate stateMachine.ts module.

High

Codebase analysis feature — 11 UI components, 0 test files

The newest and largest feature has zero tests. packages/ui/src/codebase-analysis/ has 11 components (ReviewQueue, AnalysisConfigPanel, TaskAnalysisPanel, etc.) with no coverage. The agent worker routes (/analyze-requirement, /analyze-task-subtasks) are also untested.

Action: Prioritize in the test coverage initiative. This is a high-complexity feature with AI integration—tests here prevent the hardest-to-debug regressions.

Medium

Activity page rebuild — 14 files, 0 tests

Complete rewrite of the Agent Activity page with 14 component files in packages/ui/src/activity/. Dashboard metrics, trace drill-down, audit trail sections, coverage detail. All zero tests despite being the primary monitoring surface.

Action: Include in test coverage initiative Agent 3 or 4. Focus on ActivityDashboard metric calculations and TraceDetailSections rendering.

Medium

`convex/schema.ts` — 2,794 lines, 10 changes in 2 weeks

The single source of truth for the data model is approaching 3,000 lines. Every feature adds tables and indexes here. The file was changed 10 times in 2 weeks—the highest-churn file in the codebase.

Action: Consider a domain-split approach (schema fragments that merge at build time) or at minimum add section comments delineating the 6 functional domains.

Medium

`semantic-code-search` branch diverging from development

This branch adds vector embeddings, cosine similarity search, and analysis UX improvements. It's been open while codebase analysis features shipped on development. Merge distance is growing.

Action: Either merge soon or rebase against current development. The longer it stays diverged, the more painful the merge—especially since both branches touch the analysis feature.

Low

12 unmerged feature branches accumulating

Branches like feat/add-design-analysis, fix-agent-logs, ubiquitous-github-picker appear to be stale (work merged via other branch names). They add noise to branch listings.

Action: Audit and delete stale branches. git branch --merged development | grep -v main | grep -v development to find candidates.

08 · Next Steps

Where Momentum Was Pointing

Inferred from recent activity, open specs, and project trajectory. Not prescriptive—just the direction of travel.

Immediate

Execute test coverage initiative

Spec is written at spec.md. Four parallel builder agents, domain-clustered. Target: 28% → 90% across apps/web + packages/ui. Pre-commit gate + PostToolUse hook to enforce afterwards.

Short-Term

Merge semantic code search

The semantic-code-search branch adds vector embeddings to replace GitHub code search API in requirement analysis. Should be merged before the branches diverge further.

Ongoing

Merge development → main

12 PRs have been merged to development but not yet promoted to main. The gap between the branches represents the full body of work from this 2-week window.

Tech Debt

Orchestrator decomposition

The 4,142-line sandbox orchestrator is the biggest risk to maintainability. Extract the state machine, provisioning stages, and fleet management into focused modules before the next feature touches it.

Trajectory

Deepening AI integration

The codebase analysis feature is the seed of a closed-loop system: requirements → AI analysis → implementation status → agent task assignment → sandbox execution → PR. The next features likely close more gaps in this loop.

What This Is, Right Now

Agencies and delivery teams feed in plans and conversations. Foundry decomposes them into structured requirements, reasons about implementation, provisions AI sandboxes, and ships code to repos.

System as It Exists Today

What Happened and Why

AI Observability

Quality Infrastructure

Why Things Are the Way They Are

Biome over ESLint + Prettier

Dashboard-first for Agent Activity

AI analysis with human review queue

repositoryIds as arrays, not single values

Design context as cascading pipeline

4-agent parallel strategy for test coverage

Working, In Progress, Broken, Blocked

The 10 Things to Hold in Your Head

Where Understanding Is Weakest

convex/sandbox/orchestrator.ts — 4,142 lines, 3 changes in 2 weeks

Codebase analysis feature — 11 UI components, 0 test files

Activity page rebuild — 14 files, 0 tests

convex/schema.ts — 2,794 lines, 10 changes in 2 weeks

semantic-code-search branch diverging from development

12 unmerged feature branches accumulating

Where Momentum Was Pointing

Execute test coverage initiative

Merge semantic code search

Merge development → main

Orchestrator decomposition

Deepening AI integration

`convex/sandbox/orchestrator.ts` — 4,142 lines, 3 changes in 2 weeks

`convex/schema.ts` — 2,794 lines, 10 changes in 2 weeks

`semantic-code-search` branch diverging from development