Metrics & Reporting
Good metrics guide decisions. Bad metrics get gamed. Learn how to measure what matters, report with context, and avoid the trap of turning numbers into targets.
1 The Hook — Why This Matters
In 2020, a Wellington SaaS company made "100% test case execution" a hard target for every sprint. By quarter three, execution rates were indeed at 100%. But customer-reported bugs had doubled. What happened?
The team gamed the metric. Complex integration tests were quietly skipped and marked N/A. Simple UI tests were duplicated to inflate the count. Testers stopped logging defects on Friday afternoons because a high defect count looked bad on the Monday report. The metric became the mission, and quality suffered.
This is Goodhart's Law in action: "When a measure becomes a target, it ceases to be a good measure." Metrics are powerful, but only when they serve the truth. As a Test Lead, your job is to choose metrics that reveal reality — and report them with the context that makes them meaningful.
2 The Rule — The One-Sentence Version
Measure trends over time, pair numbers with narrative, and never let a metric become more important than the quality it represents.
A single number is a photograph. A trend is a movie. "85% pass rate" tells you almost nothing. "Pass rate dropped from 95% to 85% this week, driven by 7 new critical defects in the payment module, with 2 blockers unresolved" tells a story that demands action.
3 The Analogy — Think Of It Like...
A pilot's instrument panel during a storm.
The altimeter says 10,000 feet. That number alone means nothing. Is it climbing or descending? Is that rate of change safe? What do the other instruments say? A good pilot reads the panel as a system, not as isolated digits. A good Test Lead reads metrics the same way: execution rate next to defect density next to escape rate next to team sentiment. One number in isolation is dangerous.
4 Watch Me Do It — Step by Step
Here is a real NZ example: a release report for an e-commerce platform. Follow this structure for every report you build.
- Choose metrics that answer stakeholder questions Executives ask: "Can we release?" Developers ask: "What is broken?" Testers ask: "What should I test next?" One report cannot answer everything. Build tiered reporting.
- Build daily reports for operations Focus on execution progress and blockers. What percentage of planned tests ran? What is blocked and why? Who is unblocked and needs support?
- Build weekly reports for management Add trend analysis and risk updates. Is defect density rising or falling? Are we finding bugs faster than we fix them? What risks have emerged?
- Build release reports for decision-makers Summarise quality, known issues, and recommendations. Include Defect Detection Percentage (DDP), escaped defects, and a clear go/no-go recommendation.
-
Calculate DDP — the king of quality metrics
DDP = Bugs found in testing / (Bugs found in testing + Bugs found in production)A DDP above 90% means your testing is effective. Below 80% means bugs are escaping. - Include qualitative context with every quantitative metric Numbers without story are noise. Pair every metric with a sentence of interpretation.
| Metric | Value | Interpretation |
|---|---|---|
| Test execution | 342/360 (95%) | 18 tests remaining, all blocked by environment issue #4421 |
| Defect density | 2.1 per feature | Down from 3.4 last release; regression suite catching more pre-dev |
| DDP | 94% | 47 caught in testing, 3 in production (all P3) |
| Critical issues | 0 open | Last critical fixed and verified 2024-03-14 |
| Escaped defects | 3 (all P3) | All cosmetic / edge-case; no customer impact reported |
| Recommendation | Proceed with release. Monitor payment gateway closely for 48 hours post-deploy. | |
5 When to Use It / When NOT to Use It
✅ Use metrics and reporting when...
- You need to communicate testing status to stakeholders
- You want to identify trends and early warnings
- You are making go/no-go release decisions
- You need to justify testing investment or headcount
- You are improving process maturity (CMMI, TMMi)
- You want to track the effectiveness of test automation
❌ Avoid metrics when...
- The team is small enough for hallway communication
- Metrics would be used punitively rather than constructively
- You only have vanity metrics with no action attached
- The cost of collecting the metric exceeds its value
- You have not established a baseline trend yet
- Leadership is known to game metrics for bonuses
Before you introduce a new metric, ask:
- Can the metric be acted upon, or is it pure observation with no lever to pull?
- Will the metric create perverse incentives? (optimise for pass rate, lose quality.)
- Do you have a baseline to compare against, or are you measuring change from nothing?
- Who will see this metric, and how might they misinterpret it?
6 Common Mistakes — Don't Do This
🚫 Reporting vanity metrics
I used to think: "Total test cases executed: 1,247" looked impressive in a slide deck.
Actually: Without context, that number is meaningless. Were they meaningful tests? Did they find bugs? What is the trend? A report with fewer, contextualised metrics beats a report with impressive but empty numbers. Always ask: "So what?" after every metric.
🚫 Hiding bad news
I used to think: If I bury the critical defect count deep in the appendix, the release meeting will go smoother.
Actually: Bad news does not get better with age. The earlier stakeholders know about a blocker, the more options they have. A Test Lead who hides risks loses credibility forever. Put the risks at the top of the report. Use red text if you have to.
🚫 Metrics without trends
I used to think: A single weekly snapshot was enough.
Actually: "Defect density is 2.1" is uninterpretable without knowing whether it was 1.5 last month or 4.0. Trends reveal velocity, acceleration, and decay. Always show at least three data points. Better yet, show a sparkline or mini-chart.
When this technique fails
Metrics reporting fails when it becomes a checkbox exercise divorced from action: you report impressive numbers monthly, but nobody uses them to improve. It also fails when metrics incentivize gaming: testers pass more tests to hit targets, not to find more bugs. Finally, without clear baselines and trends, a single metric snapshot creates confusion rather than confidence.
7 Now You Try — Report Detective
Scenario: You are reviewing two release reports for the same project, two weeks apart.
| Metric | Week 1 | Week 2 |
|---|---|---|
| Test execution | 120/200 (60%) | 198/200 (99%) |
| Defects found | 34 | 2 |
| Defects fixed | 12 | 35 |
| Critical open | 8 | 0 |
The project manager says: "Week 2 looks great. Can we release?" What questions do you ask?
🧠 Ask these questions before agreeing:
- Why did defect findings drop from 34 to 2? Did testing get shallower, or did quality genuinely improve?
- How many of the 35 "fixed" defects were re-tested and closed, versus just marked fixed?
- What are the 2 tests still not executed? Are they high-risk areas?
- What is the DDP trend? Are we finding fewer bugs because there are fewer bugs, or because we stopped looking?
Key insight: A sudden drop in defects found combined with a spike in execution rate is a red flag for diluted testing. Goodhart's Law strikes again.
8 Self-Check — Can You Actually Do This?
Click each question to reveal the answer. If you got all three, you are ready to practice.
Q1. What is Goodhart's Law, and why does it matter for test metrics?
Goodhart's Law states: "When a measure becomes a target, it ceases to be a good measure." In testing, this means that if you set a hard target like "100% test execution," people will game it by skipping complex tests or inflating counts. Metrics must guide behaviour toward quality, not replace quality itself.
Q2. How do you calculate Defect Detection Percentage (DDP)?
DDP = Bugs found in testing / (Bugs found in testing + Bugs found in production). A DDP of 94% means you caught 94% of all bugs before production. It is one of the strongest indicators of testing effectiveness.
Q3. Why is "trend over time" more valuable than a single metric snapshot?
A single snapshot is a photograph with no context. A trend reveals whether quality is improving or degrading, whether a spike is an anomaly or a pattern, and whether interventions are working. "Defect density is 2.1" is uninterpretable without knowing what it was last month.
9 Interview Prep — What They Will Ask
Q1. What happens when you set "100% test case execution" as a hard target?
People game it. Testers skip complex tests, duplicate simple ones, and mark skipped tests as N/A. This is Goodhart's Law. A better target is "95% execution with 0 critical defects open and all blockers documented." The metric should serve quality, not replace it.
Q2. How do you report testing status to a non-technical executive?
I use a traffic-light summary with three sections: (1) Current state — green/yellow/red with one sentence each; (2) Risks that could change the state — what I am watching; (3) Decision needed — what I need from them. No jargon. No raw defect counts without interpretation. Executives need to make decisions, not read spreadsheets.
Q3. Which test metrics do you consider leading indicators versus lagging indicators?
Lagging indicators tell you what already happened: defect escape rate, DDP, MTTR. Leading indicators predict future problems: test preparation backlog, environment downtime, requirement churn, and code complexity trends. A good lead uses both: leading indicators to prevent fires, lagging indicators to measure how well prevention worked.
Q4. How do you handle a stakeholder who dismisses your quality red flag because "the metrics look fine"?
I bring the narrative that the numbers cannot tell. I explain the context: "Execution is at 98%, but the 2% not run are payment integration tests, and we have never released without them." I also bring risk data: "If this fails in production, the estimated impact is $50K per hour." Numbers get attention; stories combined with numbers get action.