Headline contributions

  1. C1A 296,457-instance multi-repository defect-prediction dataset

    50–330× larger than PROMISE / AEEEM; uniform feature schema across Elasticsearch, Spring Boot, Hadoop, Kafka, and Express; 10-year commit window; defect ratio 18.61%. Distributed as part of Paper A's reproducibility bundle.

  2. C2Honest multi-classifier benchmark with statistical significance testing

    Six classifiers (LR, DT, RF, GB, XGB, MLP) with stratified 5-fold CV, paired-t and Wilcoxon tests, calibration analysis, and Strobl-bias-aware feature importance. RF best at AUC 0.8998.

  3. C3Leave-one-repository-out transfer-learning study

    Cross AUC 0.867, cross F1 0.631 across 5 OSS systems. Identification of the AUC–F1 asymmetry and defect-rate mismatch as the dominant degradation factor.

  4. C4Risk-based test prioritisation deployed on real CI/CD

    Top 10% of files capture 43.82% of defects (4.37× lift; 81.4% of oracle ceiling). FastAPI microservice on GitHub Actions with advisory and gating rollouts.

  5. C5Self-healing locator framework with mutation-based evaluation

    Eight DOM feature families; tree-ensemble ranking over 2,400 mutation events across seven refactor classes; calibrated confidence gating.

  6. C6Verified LLM test generation across multiple domains

    312 requirements / 3 domains; 68% effort reduction; 94.1% requirement coverage; 96.3% first-pass verification; symbolic mutation indicators for adequacy.

  7. C7AI governance side-rail design

    Payload controls, content redaction, author privacy hashing, prompt versioning, and audit trails — designed as part of the architecture, not retrofitted.

  8. C8Disclosure-first reproducibility practice

    The file_age_days sign-bug disclosure (and the corrected re-extraction with full before/after diff) is the worked example of how this program treats reproducibility.