01 · Foundation
Hand-tested, not scraped.
18 hours per tool, 24-prompt battery, 14-day real-workflow trial. Two human graders independent, AI grader logged but not in the mean.
cod_01 · refactor file · 8.2s
cod_02 · trace race cond · 12.4s
cod_07 · schema migration · 6.1s
cod_12 · OAuth2 · retry