Debugging Failures — Mid-Level Learning

1 The Hook — Why This Matters

In 2020, a Christchurch e-commerce company had a checkout test fail for three straight days in CI. The error was TimeoutError: Waiting for locator(".pay-now-btn"). The team reran the suite twelve times, blaming "flakiness." On the third day, a mid-level engineer opened the Playwright trace and noticed the button was actually rendering as .pay-now-button after a deployment.

The real culprit? A frontend developer had renamed the class to match the company's new BEM naming convention. The test was not flaky. It was precisely correct. The team had wasted twelve CI runs and three days of confidence because nobody followed a systematic debugging process.

Random reruns and vague blame hide real defects. A structured approach turns every failure into actionable intelligence in minutes, not days.

2 The Rule — The One-Sentence Version

Start with the assertion message, then work backwards through screenshots, logs, and network traces in that order.

The assertion tells you what failed. Screenshots tell you where it failed. Logs and traces tell you why it failed. Skip a step and you are guessing.

3 The Analogy — Think Of It Like...

Analogy

Detective work at a crime scene.

The body tells you what happened (the assertion). But you do not arrest the first person you see. You search the scene for fingerprints (screenshots), interview witnesses (logs), and check CCTV footage (network traces). Each layer narrows the suspect list. Debugging without evidence is like a detective guessing the murderer based on vibes.

4 Watch Me Do It — Step by Step

Here is the five-step process I use on every failure, whether it is local or in CI.

Read the assertion message first Expected vs actual is the most honest sentence in the output. It tells you exactly what the test thought should happen and what actually happened. Do not skim it.
Check the stack trace for your code Skip the framework frames. Find the last line that points to a file you wrote. That line number is where the test realised something was wrong. It is not always where the bug lives, but it is your starting coordinates.
Open the screenshot or trace viewer In Playwright, the trace shows DOM state, network, and console at every step. In Selenium, the screenshot at failure shows what the browser actually rendered. If the element is missing, you have a rendering or timing issue. If it is present but wrong, you have a logic issue.
Check console and network logs A frontend JavaScript error can kill a button's event listener. A 500 from the API can leave a form in an unexpected state. These show up in console and network tabs, not in the test assertion.
Reproduce locally with the same seed and data If the failure only happens in CI, mirror the CI environment locally. Same browser, same viewport, same test order. If you cannot reproduce it, check for timing issues, shared state, or environment differences.

Real-world scenario: Form field not found during checkout

Assertion: TimeoutError: locator("[name='card-number']") did not appear within 5000ms

Screenshot shows a payment form is loaded, but the card field is not visible. The form displays only a dropdown for "Payment method."
Network tab shows a pending request to /api/payment-methods that took 8 seconds. The card field only renders after this API call completes.
Root cause: The test has a 5-second timeout, but the API takes 6 seconds in this CI environment (slower network). The form renders correctly; the test's wait is too short.
Fix: Add an explicit wait for the API response or use a context manager that waits for all network requests before proceeding.

Debugging checklist

Source	What it tells you	Tool
Assertion message	Expected vs actual	Test runner output
Stack trace	Line in your code	Test runner output
Screenshot	Visual state at failure	Playwright / Selenium
Trace / video	Step-by-step DOM + network	Playwright trace
Console logs	Frontend JS errors	Browser devtools
Network logs	API failures / slow calls	HAR / Playwright network

Pro tip: If a test fails once and passes on rerun, do not dismiss it. Capture the trace, screenshot, and CI artifacts before retrying. Flakiness is a real bug with a real cause, usually timing or shared state.

5 When to Use It / When NOT to Use It

✅ Use this process when...

Any test fails, locally or in CI
A previously green suite starts failing
You are investigating intermittent failures
A deployment triggers unexpected test breakage
You are onboarding and need to understand the codebase

❌ Don't skip this when...

You are tempted to rerun without reading the error
You blame "flakiness" without evidence
You debug in CI instead of reproducing locally first
You ignore failures because "they usually pass"
You change test code to match broken behaviour

Before you start debugging, ask:

Do you have the full assertion message, stack trace, and at least one screenshot or trace?
Can you reproduce the failure locally under the same conditions as CI?
Is this a test bug or an application bug? (Check by doing the test steps manually.)
Have you checked for obvious environment differences: browser version, viewport, network?

6 Common Mistakes — Don't Do This

🚫 Rerunning without investigating

I used to think: If it passes on the second try, the first failure was just noise.
Actually: Rerunning without reading the error teaches you nothing and hides real defects. The Christchurch team burned twelve CI runs before someone opened a trace. Always read the error before hitting retry.

🚫 Blaming "flakiness" without evidence

I used to think: Some tests are just flaky; it is part of life.
Actually: Flakiness is a symptom, not a diagnosis. It usually means a race condition, shared state, or an implicit dependency on execution order. Attach timestamps, screenshots, and logs to the ticket, then treat it like any other bug. Name the root cause, not the symptom.

🚫 Debugging in CI instead of locally

I used to think: If it fails in CI, I should add console logs and push commits to debug it.
Actually: CI is expensive and slow. Download the artifacts (trace, screenshot, video, logs) and reproduce locally first. If you cannot reproduce locally, check for environment differences: viewport size, browser version, network latency, or database seed data.

When this technique fails

Debugging fails when you do not have artifacts (no traces, no screenshots), when the failure is truly environmental (production-only race condition), or when the test itself is flawed and you are trying to debug the wrong thing. Always distinguish between a test bug and an application bug before diving into detective work.

7 Now You Try — Interview Warm-Up

🎯 Interactive Exercise

Diagnose this failure:

FAILED test_checkout.py::test_pay_now - 
  TimeoutError: locator("[data-testid='pay-now']") 
  did not appear within 5000ms

You open the trace and see the button is visible on screen with text "Pay Now". What is your next debugging step?

Next step: inspect the element in the trace DOM snapshot.

The button is visible, so it is not a rendering issue. Check whether:

The data-testid attribute is actually present (the visible text might be a different element).
The element is inside an iframe or shadow DOM.
A JavaScript error prevented hydration (check console logs).
The locator is too strict (e.g., exact match when partial would work).

Tip: Visible does not mean locatable. Always verify the exact DOM attributes in the trace snapshot.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. If you got all three, you're ready to practice.

Q1. Why should you read the assertion message before opening a screenshot?

The assertion tells you exactly what the test expected and what it received. Without that context, a screenshot is just a picture. The assertion narrows your search: if it says "expected 3 items, found 2," you know to look for a missing element, not a colour change.

Q2. What does a passing screenshot but failing assertion usually indicate?

It often means a timing issue (the element appeared after the assertion ran), a locator mismatch (the element is visible but the selector targets a different one), or hidden state (the element is in the DOM but not interactable due to opacity or a parent overlay).

Q3. What is the difference between a test bug and an application bug?

A test bug means the test is wrong: outdated selector, wrong expected value, or missing setup. An application bug means the test is correct and the software under test is broken. Distinguish them by reproducing manually: if the manual steps match the test and the defect is real, it is an application bug. If the manual steps work, the test is the problem.

9 Interview Prep — What They'll Ask

Q1. "A test fails only in CI but passes locally. How do you debug it?"

I download all CI artifacts first: trace, screenshot, video, and logs. I compare the CI environment to my local one: browser version, viewport, timezone, and database seed. If I still cannot reproduce it, I run the test in a Docker container that mirrors CI. Common culprits are timing issues, shared state, and headless-only rendering differences.

Q2. "What is your approach to a suite with 10% flaky tests?"

I treat flakiness as a quality metric, not individual bad luck. I categorise failures by root cause: race conditions, shared state, environment drift, or data collisions. I fix the highest-impact flakes first, adding explicit waits or isolating data. For tests I cannot immediately fix, I quarantine them in a separate CI job so they do not erode trust in the main suite.

Q3. "How do you decide whether a failing test is worth fixing versus deleting?"

If the test covers a critical user journey, I fix it. If it duplicates coverage already tested elsewhere and is more expensive to maintain than the value it provides, I delete it after confirming the overlap. A test that fails constantly and is ignored by the team is worse than no test at all; it trains people to distrust CI.

Q4. "What artifacts do you configure your test runner to keep on failure?"

Screenshots, HTML DOM snapshots, browser console logs, network HAR files, and video traces. In Playwright, the trace viewer is the single most valuable artifact because it combines DOM, network, and console in one interactive file. I retain these for at least seven days in CI storage.