Senior · Non-Functional Technique

Performance Testing Deep Dive

Load testing is not performance testing. Load testing asks "what breaks?" Performance testing asks "why?" Finding the bottleneck requires profiling, baseline data, SLO targets, and the discipline to measure before and after.

Senior ISTQB CTAL-TTA 5.3 — K4 Analyse ~14 min read + exercise

1 The Hook — Why This Matters

In 2021, a major NZ bank's payment processing system began slowing down during peak business hours (9am-11am, 1pm-3pm). Payment processing took 2 seconds at 8am, 8 seconds at 10am. Customers complained. The team ran a load test: "yes, it's slow under load." A junior engineer suggested "buy more servers!" They doubled the server count. Payments were still slow. The team commissioned a performance consultant at NZD 5,000/day. Within two days, the consultant found the real bottleneck: a database query in the transaction ledger was doing a full table scan instead of using an index. The index had been accidentally dropped during a deployment. Rebuilding the index took 30 minutes and cost NZD 0. The consultant's fee could have been saved with proper performance testing and profiling.

Performance testing without profiling is expensive guesswork. Profiling without baselines is chasing shadows. You must measure, analyse, and verify fixes systematically.

2 The Rule — The One-Sentence Version

Establish a baseline, define SLOs, profile for bottlenecks, make one change, re-measure, verify improvement, repeat. Measure p95 and p99, not just averages.

Performance testing has five layers: (1) Baselines — what is the current performance? (2) SLOs — what performance do we need? (3) Load Testing — what load causes issues? (4) Profiling — where is the bottleneck? (5) Optimisation & Verification — did the fix work? Test each layer independently, and always re-measure after changes.

3 The Analogy — Think Of It Like...

Analogy

Diagnosing a car that's running slowly, not just "driving faster."

You notice the car is slow (baseline). The owner wants it to do 100km/h on the motorway (SLO). You drive it and it's struggling at 60km/h (load test). Now, where's the problem? You could add more fuel, change the oil, upgrade the engine. But if the real problem is the parking brake is on, none of that helps. A mechanic would: check the baseline (tachometer reads 2000 rpm at 60km/h when it should read 1000), diagnose the brake issue (profiling), release the brake (fix), re-measure (now 1000 rpm at 60km/h), verify improvement (yes, the brake was the bottleneck). Without profiling, you'd waste money on engine upgrades.

4 Watch Me Do It — Step by Step

Here is a real NZ example: an API that returns user account summaries is slow. Follow these steps to find and fix the bottleneck.

  1. Establish a baseline Make 100 requests to the API in serial (one after another), measure response time. Record p50, p95, p99. Note: p50 is the median, p95 means 95% of requests are faster than this, p99 means 99% are faster. Don't use averages; they hide outliers.
    // Baseline: serial requests
    const times = [];
    for (let i = 0; i < 100; i++) {
      const start = Date.now();
      await fetch('/api/account-summary');
      times.push(Date.now() - start);
    }
    times.sort((a, b) => a - b);
    console.log({
      p50: times[50],  // median: 120ms
      p95: times[95],  // 95th percentile: 450ms
      p99: times[99]   // 99th percentile: 2100ms (outlier)
    });

    Found: p50 is 120ms (acceptable), but p99 is 2100ms (bad). This tells you the system works for 99% of users but 1% experience 2-second waits. You must investigate the p99 outliers.

  2. Define SLOs (Service Level Objectives) What response time do you need? Typical SLOs for NZ fintech: p95 < 200ms, p99 < 500ms. For payment processing: p95 < 100ms, p99 < 300ms. Define SLOs based on user experience and business requirements, not what the system currently achieves.

    SLO: p95 < 200ms, p99 < 500ms. Current: p95 = 450ms, p99 = 2100ms. We're failing SLO for p95 and p99.

  3. Load test to find the breaking point Gradually increase concurrent requests (1, 5, 10, 50, 100, 200 concurrent users) and measure response times. Find the point where p95 exceeds the SLO. This is your breaking point.
    // Ramp up: 10 concurrent users
    const startTime = Date.now();
    const promises = [];
    for (let i = 0; i < 10; i++) {
      promises.push(fetch('/api/account-summary'));
    }
    const results = await Promise.all(promises);
    const elapsed = Date.now() - startTime;
    console.log({concurrency: 10, avgResponseTime: elapsed / 10, p95: /* ... */});
    // Repeat with 20, 50, 100, 200 concurrent users

    Found: At 50 concurrent users, p95 jumps from 200ms to 1000ms. The system starts struggling at 50 concurrent users.

  4. Profile to find the bottleneck Use CPU profiling (flame graphs), memory profiling (heap dumps), and database profiling (slow query logs). Run the API under load and capture profiles.
    // Node.js CPU profiling with clinic.js
    clinic doctor -- node app.js
    // Then run load test: artillery run load-test.yml
    // clinic.js produces a report showing which functions consume CPU

    Found: CPU profiling shows 45% of CPU time is spent in calculateAccountBalance(). Database profiling shows this function runs a query: SELECT * FROM transactions WHERE user_id = ? ORDER BY date DESC. The query takes 500ms (no index on user_id).

  5. Optimize: add the missing index The transactions table has millions of rows. The query without an index does a full table scan (reads every row). Adding an index on user_id makes the query read only relevant rows.
    -- Add index
    CREATE INDEX idx_transactions_user_id ON transactions(user_id);
    -- Verify index is used
    EXPLAIN SELECT * FROM transactions WHERE user_id = ? ORDER BY date DESC;
  6. Re-measure and verify improvement Run the same load test again and capture new baseline metrics. Compare p95 and p99 before and after.
    // Before: p50=120ms, p95=450ms, p99=2100ms (at 50 concurrent users, p95=1000ms)
    // After: p50=50ms, p95=120ms, p99=300ms (at 50 concurrent users, p95=150ms)
    // We now meet SLO: p95 < 200ms ✓, p99 < 500ms ✓
  7. Test memory usage under load Even if response time is fast, memory might grow unbounded. Run load test for 10 minutes, monitor memory. If memory grows continuously, there's a leak.

    Pattern: Use heap dump analysis (jmap in Java, heap snapshots in Node.js) to find which objects are growing. Common causes: cached objects never evicted, event listeners never removed, database connections not closed.

  8. Test cache effectiveness If your system uses caching (Redis, Memcached), measure cache hit rate. A low hit rate (< 80%) means most requests miss the cache. Optimize by increasing cache size, using better cache keys, or reducing TTL.
    // Monitor cache metrics
    const cacheHits = metrics.cacheHits;
    const cacheMisses = metrics.cacheMisses;
    const hitRate = cacheHits / (cacheHits + cacheMisses);
    console.log({hitRate: (hitRate * 100).toFixed(2) + '%'}); // Target: > 80%
  9. Test for regressions in CI/CD Add performance tests to your CI/CD pipeline. On every PR, run load tests and alert if p95 degrades by > 10%. This catches performance regressions before they ship.
    // CI/CD performance gate
    baseline_p95 = 150ms  # from previous measurement
    current_p95 = run_load_test()
    if current_p95 > baseline_p95 * 1.1:  # 10% threshold
      fail("Performance regression: p95 increased from {baseline_p95}ms to {current_p95}ms")
Pro tip: Always measure on the same hardware under the same conditions. Run tests at the same time of day (to avoid network congestion variations). Use k6 or Apache JMeter for repeatable, scriptable load tests. Use clinic.js (Node.js), Java Flight Recorder (Java), or Datadog APM for profiling.

5 When to Use It / When NOT to Use It

✅ Prioritise performance testing when...

  • The application is user-facing (e.g., web, mobile, API)
  • Performance directly affects user experience (slow = churn)
  • You process high traffic (payments, messaging, streaming)
  • You have SLA/SLO requirements (99.9% uptime, p95 < 100ms)
  • You've made architectural changes (database, caching, async)
  • You're preparing for peak load (Black Friday, tax deadline)

❌ Don't fall into these traps...

  • Running load tests without a baseline (you won't know if you improved)
  • Optimizing without profiling (guessing where the bottleneck is)
  • Using average response time (p50) instead of p95/p99
  • Testing on different hardware than production (results won't transfer)
  • Ignoring memory and GC overhead (fast CPU means slow memory leak)
  • Not testing for regressions in CI (performance degrades silently)

6 Common Mistakes — Don't Do This

❌ Optimizing without profiling

I used to think: The API is slow, so I'll add caching and use async. That should help.
Actually: The NZ bank wasted time adding servers when the real problem was a missing database index. Profiling (CPU, memory, database) reveals the true bottleneck. Without profiling, you're guessing. The consultant saved days of blind optimisation by profiling first.

❌ Using average response time instead of percentiles

I used to think: If average response time is 100ms, that's good.
Actually: Average hides outliers. If 99% of requests are 80ms and 1% are 10 seconds, the average is ~110ms, but 1% of users experience terrible performance. Always report p50, p95, p99. SLOs should target p95 and p99, not average.

❌ Testing on different hardware than production

I used to think: Performance results on my laptop are good enough.
Actually: Production hardware is different. CPUs, memory, network, disks vary. Measure on hardware identical to production or use cloud environments (AWS, Azure) that mimic production. Otherwise, your results won't transfer.

7 Now You Try — Interview Warm-Up

🎯 Interactive Exercise

Question: Your API's p95 response time is 500ms, but your SLO is p95 < 200ms. You have two options: (1) add more servers (costs NZD 500/month), or (2) profile and optimise. What do you do, and why?

Think through the logic before revealing.

Best answer: Profile first.

Why: Adding servers might help (if the bottleneck is CPU or I/O saturation), but if the bottleneck is a missing database index (like the NZ bank), adding servers does nothing. You'd waste NZD 500/month and still miss the SLO.

Process: (1) Profile: capture CPU, memory, database metrics under load. (2) Find the bottleneck: maybe it's a slow query, maybe it's garbage collection, maybe it's a missing cache. (3) Optimise: add the index, enable caching, etc. (4) Re-measure: verify p95 is now < 200ms. (5) If still not met, and CPU is at 95%, then add servers.

Cost savings: Profiling takes 2-4 hours. Fixes (index, cache) cost NZD 0. You save NZD 500/month forever. Adding servers first wastes money and doesn't solve the problem.

8 Self-Check — Can You Actually Do This?

Click each question to reveal the answer. If you got all three, you're ready to own performance.

Q1. What's the difference between p50, p95, and p99?

p50 (median): 50% of requests are faster, 50% are slower. p95: 95% of requests are faster than this value. p99: 99% of requests are faster. For SLOs, you should target p95 and p99, not p50. If p50=100ms but p99=2s, you're failing 1% of users, which is unacceptable at scale.

Q2. What's a baseline and why is it important?

A baseline is a measurement of current performance under defined conditions. Before optimising, you measure the baseline (e.g., p95=450ms). After optimising, you re-measure and compare (e.g., p95=150ms). Without a baseline, you can't tell if your optimisation actually helped. The baseline also helps you spot regressions in CI/CD.

Q3. How do you identify the bottleneck without guessing?

Use profiling tools: CPU profiler (shows which functions consume CPU), memory profiler (shows heap growth and GC), database profiler (shows slow queries and lock contention). Run the application under realistic load and capture profiles. The profiles tell you exactly where time is spent. Then, optimise that specific area and re-measure to verify improvement.

9 Interview Prep — Common Questions

Q. "How do you establish a performance baseline?"

I make many sequential requests to the system and measure response times (at least 100 requests to get stable statistics). I calculate p50, p95, p99, and min/max. I document the test conditions (hardware, concurrent users, time of day) so the baseline is reproducible. Then, after any changes, I re-run the same test and compare. Baselines are critical: without them, you can't tell if you improved or regressed.

Q. "How do you approach profiling a slow application?"

I use three profilers: (1) CPU profiler to see which functions consume CPU time (flame graphs). (2) Memory profiler to detect leaks and GC overhead. (3) Database profiler to find slow queries. I run the application under representative load, capture profiles, and analyze. The profiles show exactly where the bottleneck is. Then, I make one targeted fix, re-measure, and verify improvement. Iterate until I meet SLO targets.

Q. "What's the difference between load testing and stress testing?"

Load testing gradually increases load until you find breaking points (p95 exceeds SLO). Stress testing increases load beyond normal to find the absolute breaking point (system becomes unavailable). Load testing answers "at what load do we fail?" Stress testing answers "how much abuse can the system take?" Both are important: load testing helps you set capacity limits; stress testing helps you understand failure modes.

Q. "How do you prevent performance regressions?"

I add performance tests to CI/CD. On every PR, I run a load test and measure p95. If p95 degrades by more than 10% compared to baseline, I fail the build. This catches regressions before they ship. I also monitor production (APM tools like Datadog, New Relic) and alert if p95 exceeds SLO. Combining CI/CD gates with production monitoring prevents silent performance degradation.