Agentic workflow platform

Author. Run.
Diagnose. Improve.

The full lifecycle of agentic workflows — from versioned DAG definitions to per-node diagnosis, with evals that measure improvement by the numbers, not by feel.

Inspect each node: exact prompt, context, model, token usage

Compare eval scores before and after prompt iterations

Provider-agnostic — swap models without changing the DAG

SDLC Theme · story/ARC-427run_9f2k4a · 4m 32s elapsed

Active

context:ARC-427.work_item

model:sonnet-4-6

provider:anthropic

rag:codebase · 3 docs

tokens:8.2k in · 2.1k out

theme:sdlc · v2.1.0

implement· tool calls

read_filepackages/core/ports/runs.py1.2s

read_fileapps/api/features/runs/models.py0.8s

read_filedocs/decisions/ADR-0018.md0.4s

write_fileapps/api/features/runs/routes.py

context surface

work_itemARC-427 · Add run action endpoints

rag · 3 docsADR-0018 · runs/ports.py · runs/models.py

upstreamplan.output · scope.output

promptsdlc/implement.jinja2 · v4

18:42:03[scope]Work item resolved · 9 tasks scoped

18:44:17[plan]Architecture plan complete · ADR-0018 applied

18:46:31[impl]RAG resolved · 3 context docs attached

18:47:02[impl]3 files read · writing routes.py

18:47:09[impl]POST /runs/{id}/actions · 47 lines

Active Runs

94%

Pass Rate

Stories

8.2k

Avg Tokens

Recent Runslast 24h · 8 total

▶run_9f2k4aARC-427 · Implement run actions4m 32s● running

▶run_8k1p3eARC-425 · WorkGraph coordinator2m 11s● running

✓run_7m3k1bARC-421 · Implement runs routes3m 18s9m ago

✓run_5n8j2cARC-419 · Fix auth middleware5m 44s22m ago

✓run_3p4h7dARC-415 · Add eval integration4m 08s1h ago

✓run_2q9r6fARC-411 · Tracing adapter6m 22s2h ago

WorkGraphSprint 4

Sprint 4 · Foundation8 / 12 complete

2 in-progress2 queued0 blocked8 done

Quality Gateslast 12 runs · devloop stage

vendor-boundary

12/12✓

import-linter

12/12✓

banned-patterns

12/12✓

eval-pass-rate

10/12↗

pyright-strict

11/12↗

Provider Mixlast 30d

anthropic · sonnet-4-6

72%

anthropic · haiku-4-5

20%

openai · gpt-4o

Eval score progression · sdlc/implement.jinja2

What changed

58%✗missing context scope — no work_item in prompt surface

71%✗added RAG context — codebase + ADR-0018 attached

84%✗tightened output schema — strict JSON mode enforced

94%✓injected upstream task outputs — plan.output threaded in

Eval dimensions · v4 score breakdown

code-correctness

96%

output-schema

97%

test-coverage

92%

arch-adherence

88%

token-efficiency

91%

+36ppeval score58% → 94%

−18%token cost4.1k → 3.4k avg

+2/12pass ratewas 7/12 → 10/12

● v4 promoted to production · eval: implement-quality · baseline: v1 · 2h ago

The improvement loop

Every step captured.
Every failure diagnosable.

The closed loop that separates a governed platform from a pile of scripts.

Author

Define versioned workflow DAGs — tasks, prompts, context sources, agents.

Run

Execute against real work. Live status per node. Partial re-runs.

Observe

Per-node: exact prompt, resolved context, RAG data, model, output.

Diagnose

Isolate the failing node. Inspect its full input surface. Experiment.

Improve

Run evals against a baseline. Promote what works. Prove it got better.

Platform capabilities

Everything the loop needs.

Workflow Orchestration

Define versioned DAGs of agentic tasks. Author once, run repeatedly, with live per-node status.

Per-node Observability

Every task captures its exact input surface — prompt, context, RAG data, upstream outputs.

Isolated Re-run

Re-run any single task without restarting the pipeline. No waiting for passing nodes.

Eval & Feedback Loop

Attach measurable checks to any task. Quality scores feed back into workflow definitions.

Provider-agnostic

No vendor name in the domain layer. Swap models and providers without touching your DAG.

Theme System

Domain workflows packaged as themes. SDLC is flagship. Extend to any domain without touching core.

Ready to close the loop?

Govern your agentic pipelines.
Improve them scientifically.

Internal-first. Provider-agnostic. Built to improve.

Author. Run.Diagnose. Improve.

Every step captured.Every failure diagnosable.

Everything the loop needs.

Govern your agentic pipelines.Improve them scientifically.

Author. Run.
Diagnose. Improve.

Every step captured.
Every failure diagnosable.

Govern your agentic pipelines.
Improve them scientifically.