Everything in one product
Each capability maps directly to a paper — no marketing claim on this page is unsupported by published research.
Defect-Driven Test Prioritisation
Run the production model gb-paper1-v4-fixed_age against any pull request. Top 10% of files capture 43.82% of defects.
LLM-Powered Test Generation
Requirements in, executable tests out. Deterministic rule verifier in the loop; 96.3% first-pass verification.
Self-Healing Automation
Tree-ensemble locator ranking recovers broken selectors at runtime with confidence-gated heal-vs-flag decisions.
Cross-Repository Risk Profiles
Leave-one-repository-out validated transfer for teams whose codebases don't look like the benchmark set.
CI/CD & Webhook Integration
GitHub Actions, advisory and gating rollouts, pre-merge risk advice, and a stable HTTP contract for IDE plugins and CLI tools.
AI Governance Layer
Payload controls, content redaction, author privacy hashing, prompt versioning, and verifiable audit trails on every call.
Post-Execution Defect Attribution
Fuse failure signals with SHAP-explained priors to produce triage advice, not just ranked file lists.
Quality Analytics Dashboard
Real-time view of risk distribution, top-k coverage, and self-heal vs. flag rates across services.
How TestForge AI works
Four steps from sign-up to a pre-merge gate that catches defects before they ship.
-
01Connect your repository▶
OAuth into GitHub; TestForge AI mirrors the commit history through the Repository Analytics Engine and produces a process-metric snapshot per file.
-
02Calibrate the risk model▶
The cross-repository transfer pipeline from Paper B picks the best starting point; an in-team calibration pass uses Platt scaling to tune for your defect rate.
-
03Wire into CI/CD▶
Install the GitHub Actions workflow. Start in advisory mode (PR comments only). Promote to gating mode when the team is comfortable.
-
04Generate & verify tests▶
Drop a requirement into the RAITG service. Receive Playwright + BDD specs, the rule-verifier report, and a symbolic-mutation-indicator adequacy score.
What sets TestForge AI apart
Research-backed, not vibe-coded
Every claim on the product page maps to a published or in-flight paper with reproducible artefacts.
Calibration before accuracy
Production selection criteria privilege calibration, top-k stability, and threshold sensitivity over raw AUC.
Multi-language, multi-framework
Java, JavaScript, Python, and Go on the roadmap. Frameworks: Selenium, Playwright, Cypress, PyTest, JUnit.
Verified LLM generation
Deterministic rule verifier in every loop. The LLM never gets the last word.
Governance is architecture
Payload controls and prompt receipts are not opt-in; they're how the platform talks to models.
From research to production
The exact code that produced the paper's numbers is the code running in production.