Headline contributions
- C1A 296,457-instance multi-repository defect-prediction dataset▶
50–330× larger than PROMISE / AEEEM; uniform feature schema across Elasticsearch, Spring Boot, Hadoop, Kafka, and Express; 10-year commit window; defect ratio 18.61%. Distributed as part of Paper A's reproducibility bundle.
- C2Honest multi-classifier benchmark with statistical significance testing▶
Six classifiers (LR, DT, RF, GB, XGB, MLP) with stratified 5-fold CV, paired-t and Wilcoxon tests, calibration analysis, and Strobl-bias-aware feature importance. RF best at AUC 0.8998.
- C3Leave-one-repository-out transfer-learning study▶
Cross AUC 0.867, cross F1 0.631 across 5 OSS systems. Identification of the AUC–F1 asymmetry and defect-rate mismatch as the dominant degradation factor.
- C4Risk-based test prioritisation deployed on real CI/CD▶
Top 10% of files capture 43.82% of defects (4.37× lift; 81.4% of oracle ceiling). FastAPI microservice on GitHub Actions with advisory and gating rollouts.
- C5Self-healing locator framework with mutation-based evaluation▶
Eight DOM feature families; tree-ensemble ranking over 2,400 mutation events across seven refactor classes; calibrated confidence gating.
- C6Verified LLM test generation across multiple domains▶
312 requirements / 3 domains; 68% effort reduction; 94.1% requirement coverage; 96.3% first-pass verification; symbolic mutation indicators for adequacy.
- C7AI governance side-rail design▶
Payload controls, content redaction, author privacy hashing, prompt versioning, and audit trails — designed as part of the architecture, not retrofitted.
- C8Disclosure-first reproducibility practice▶
The
file_age_dayssign-bug disclosure (and the corrected re-extraction with full before/after diff) is the worked example of how this program treats reproducibility.