# Software Factory Notes for Attractor

These are working notes from a transcript of Justin McCarthy describing StrongDM's AI software-factory approach. The goal is not to copy that system, but to surface ideas that seem important for Attractor: a local-first DOT-graph workflow engine for unattended AI software work.

## Core frame

The transcript's central lesson is that useful AI autonomy does not require models to be perfect. The important threshold is when models move from **never working** to **sometimes working**. Once models sometimes complete useful work, a workflow engine can add validation, retries, checkpoints, and human gates to turn unreliable attempts into a more reliable process.

For Attractor, the key question is not:

> Can the agent do the whole task flawlessly?

It is:

> Can Attractor turn “sometimes works” into a controlled process through validation, fixup loops, checkpoints, and human escalation?

That is very close to Attractor's reason to exist.

## 1. “Do not write code” becomes “do not read code”

The speaker argues that once AI can generate code much faster than humans can review it, human code review becomes the bottleneck. The provocative extension is: if you are serious about throughput, you cannot merely stop writing every line manually; you also have to stop reading every line manually.

For Attractor, this suggests that humans should not usually be asked to inspect raw diffs. Instead, humans should inspect evidence:

- validation reports,
- test results,
- type/lint/security output,
- summaries of changed behavior,
- risk assessments,
- failing cases,
- explicit decision gates.

A human gate should be an escalation point for ambiguity, not a routine pause after every generated diff.

Example shape:

```dot
implement -> run_tests -> judge_results
judge_results -> fixup [label="FAILURE"]
judge_results -> human_gate [label="UNCERTAIN"]
judge_results -> exit [label="PASS"]
```

## 2. Validation is the mold

The strongest metaphor in the talk is injection molding. Code generation without validation is like injection molding without a mold: molten plastic everywhere. The mold is validation.

For Attractor, this means:

- the workflow graph alone is not enough;
- the agent prompt alone is not enough;
- the real shape of the output comes from validation nodes.

Agent claims such as “I completed the task” should carry almost no weight by themselves. The graph should ask: **what evidence proves this?**

Validation should be represented explicitly using existing Attractor primitives:

- `tool` nodes for tests, linters, typecheckers, benchmarks, security scans;
- `agent` nodes for LLM-as-judge review when deterministic checks are insufficient;
- failure edges back to fixup agents;
- human gates for unresolved ambiguity.

## 3. “You only have what you can validate”

This should be treated as a guiding principle for Attractor workflow design.

| Desired property | Possible validation artifact |
| --- | --- |
| Code works | unit/integration tests pass |
| Code is formatted | formatter/linter output |
| Types are correct | typechecker output |
| Requirement is met | acceptance test or LLM judge report |
| Secure enough | security scan or threat-model review |
| No regression | existing test suite and benchmark comparison |
| User intent preserved | generated scenarios, examples, or human gate |

If a workflow has no validation node for a claim, the workflow does not really know that claim is true.

## 4. Every “what about?” becomes a validation node

The transcript repeatedly says that every “what about?” belongs in validation:

- What about tests?
- What about security?
- What about performance?
- What about durability?
- What about edge cases?
- What about customer behavior?

For Attractor, these questions should become visible graph structure.

Example:

```dot
implement -> unit_tests
unit_tests -> lint
lint -> typecheck
typecheck -> security_review
security_review -> performance_check
performance_check -> acceptance_judge
acceptance_judge -> exit

unit_tests -> fixup [label="FAILURE"]
lint -> fixup [label="FAILURE"]
typecheck -> fixup [label="FAILURE"]
security_review -> fixup [label="FAILURE"]
performance_check -> fixup [label="FAILURE"]
acceptance_judge -> fixup [label="FAILURE"]
fixup -> unit_tests
```

The workflow graph becomes a concrete map of the team's concerns.

## 5. Closed-loop bug fixing

The talk describes a bug report that was not written by a human and not handled by a human. The system observed bad behavior, wrote up the issue, fixed it, and validated the fix.

For Attractor, this maps directly to a local closed loop:

1. an agent implements a change;
2. validation finds an issue;
3. the validation output becomes an artifact;
4. a fixup agent consumes that artifact;
5. validation runs again;
6. the loop continues until pass, retry exhaustion, or human gate.

The “bug report” may be transient run evidence rather than a Jira ticket:

- `test-output.txt`,
- `lint-report.json`,
- `judge-finding.md`,
- `security-review.md`,
- `perf-regression.md`.

Attractor's local git-backed checkpoints are a natural place to preserve this evidence trail.

## 6. Human as “Supreme Court”

The speaker uses a judicial metaphor: most cases should be handled automatically, while hard or undecidable cases are kicked up to humans.

For Attractor, human gates should be used for:

- ambiguity,
- policy decisions,
- product tradeoffs,
- risk acceptance,
- missing intent,
- expensive decisions,
- cases where multiple fixes are plausible and the direction matters.

This suggests a workflow style where human review is not a blanket step. It is a specific edge label such as `UNCERTAIN`, `NEEDS_DECISION`, or `RISK_ACCEPTANCE`.

## 7. “Apply more tokens” means gather better evidence

The transcript says that for undecidable cases, the cure is often to “apply more tokens.” This should not be read only as “use a bigger model.” It often means gathering better context:

- logs,
- traces,
- screenshots,
- failing examples,
- benchmark output,
- customer examples,
- docs,
- previous bug reports,
- transcripts,
- telemetry,
- generated scenarios.

For Attractor, this suggests workflows should often have evidence-gathering tool nodes before diagnosis or judgment nodes.

Example:

```dot
collect_logs -> collect_screenshots -> summarize_evidence -> diagnose
```

The agent should not be asked to reason from vibes if the graph can collect hard local evidence first.

## 8. Synthetic validation data

The talk describes bootstrapping validation with synthetic conversations and scenarios, then judging whether the outputs are plausible, redundant, wrong, or useful.

For Attractor, this suggests a reusable pattern:

1. generate scenarios from a seed intent;
2. run the implementation against those scenarios;
3. judge the outputs;
4. collect failures;
5. fix the implementation;
6. expand the scenario corpus.

Example:

```dot
seed_scenarios -> generate_cases -> run_cases -> judge_cases
judge_cases -> fixup [label="FAILURE"]
judge_cases -> expand_corpus [label="PASS"]
expand_corpus -> exit
```

This is especially relevant for agentic systems, CLIs, UX flows, workflow engines, and other systems where handwritten tests may not cover enough behavior.

## 9. Error bars instead of boolean correctness

The speaker argues that many systems pretend correctness is boolean, while in practice there are hidden error bars. This is especially true for stochastic or agentic systems.

Attractor workflows may eventually need probabilistic validation patterns:

- run 100 generated cases;
- pass if at least 95 satisfy the judge;
- escalate if failures are clustered or severe;
- sample different trajectories;
- preserve representative failures as artifacts.

A validation node might produce a structured result like:

```json
{
  "passed": 97,
  "failed": 3,
  "threshold": 95,
  "decision": "PASS"
}
```

This may imply future Attractor support for richer result labels than simple success/failure, though v0 can approximate this with tools that emit labeled outcomes.

## 10. Digital twin universe

StrongDM apparently cloned external SaaS integrations so they could test at high volume without depending on real Salesforce, Google Drive, and similar systems. The important idea is not the specific services; it is that local or fake worlds can make validation economical.

For Attractor, this aligns strongly with local-first execution:

- local simulators,
- fake services,
- mock APIs,
- fixture worlds,
- ephemeral test repos,
- generated environments.

Attractor should be good at creating or entering a local test universe, running the agent inside it, and validating behavior before touching real systems.

## 11. Workflows as feedback-control systems

The speaker references oscillation, PID controllers, overshoot, undershoot, and stabilization. This is deeply related to the Attractor metaphor.

An Attractor workflow is not just a checklist. It is a feedback-control system:

- generation creates output;
- validation measures output;
- fixup changes output;
- validation measures again;
- human gates handle unstable or ambiguous states.

Important workflow-design question:

> Does this graph have a stable convergence path, or can it loop forever?

Attractor workflows therefore need practical control features:

- retry limits,
- loop counters,
- checkpoint history,
- failure summaries,
- escalation gates.

The goal is convergence toward a desired state, not endless agent activity.

## 12. The scarce resource is “right tokens”

At the end, the speaker says the hard part is getting the right tokens, especially from customers' brains. This is a useful correction to “just automate coding.”

The highest-value Attractor workflows may be those that transform vague intent into executable validation:

```text
vague intent -> concrete criteria -> validation artifacts -> implementation
```

This suggests workflows for:

- clarifying intent,
- collecting examples,
- extracting acceptance tests from conversation,
- identifying missing requirements,
- asking targeted human questions,
- turning product intuition into validation criteria.

Coding is only one stage. The deeper leverage is encoding intent into a mold that future generation can fill.

## Compressed principles for Attractor

1. The graph is the factory line.
2. Validation nodes are the mold.
3. Agent claims are worthless without evidence.
4. Human gates are for uncertainty, not routine review.
5. Failures should loop into fixup automatically.
6. Every “what about?” should become a test, judge, tool, or gate.
7. The system should collect better context before asking the model to decide.
8. Attractor is a feedback-control system, not just a task runner.
9. The durable artifact is not merely generated code; it is the graph plus the validation evidence trail.

## Working conclusion

Attractor should not be designed around the idea that “AI writes code.” It should be designed around this stronger frame:

> AI proposes changes into a local validation/control loop that converges toward a desired state, with humans only handling explicitly surfaced uncertainty.

That frame connects the transcript directly to Attractor's DOT graph model, checkpointing, validation idioms, and human-gate design.