Headline numbers

296,457

File instances across 5 mature OSS systems — 50–330× larger than PROMISE / AEEEM benchmarks. 10-year commit window.

0.8998

AUC of the best classifier (Random Forest, Paper A). XGBoost a close second at 0.8955.

43.82%

Defects captured in the top 10% of risk-ranked files — a 4.37× lift over uniform random and 81.4% of the oracle ceiling.

68%

Effort reduction from manual test authoring to RAITG (Paper E): 184h → 58.9h across 312 requirements.

94.1%

Requirement coverage achieved by the LLM + rule-verifier pipeline, up from 71.2% manual baseline.

96.3%

First-pass verification rate for generated tests — the deterministic rule check passes on the first try.

Where the research lives in production

A direct line from each paper to a running service or running deployment.