Automation · SDET Interview Prep

Automation / SDET Interview Prep

30 questions across six categories — framework design, Playwright, API testing, CI/CD quality gates, code quality, and real technical scenarios. These are the questions NZ hiring managers ask developer-testers and SDETs at mid-to-senior level, with model answers and TypeScript code where it matters.

Mid – Senior SDET / Developer-Tester 30 Q&As · ~45 min read
How to use this page: Read the question and form your own answer before revealing the model answer. For technical scenarios, draft an approach in full before checking. The model answers are a ceiling, not a script — adapt them to your own experience.

1 Framework Design

These questions test whether you understand why design decisions exist, not just whether you can recite patterns. NZ SDET roles at companies like Xero, Trade Me, Pushpay, and ANZ NZ expect you to defend your choices.

Q1: Why would you use the Page Object Model, and when would you not use it?

Model answer

Page Object Model (POM) separates the concerns of “how to interact with a page” from “what the test is asserting.” When a selector changes — for example, the Xero dashboard redesigns its navigation — you update one page object rather than hunting through 40 test files. This makes the suite dramatically cheaper to maintain. POM also gives test authors a fluent API that reads like user behaviour: loginPage.enterCredentials(user).submit() rather than raw locator chains repeated everywhere. The trade-off is overhead: each new page needs a corresponding object, and for a small suite of 10–15 tests the abstraction adds more code than it saves. I would skip POM for throwaway scripts, spike investigations, or test suites with five or fewer pages. For any suite that a team will maintain across multiple sprints — which describes most NZ SaaS products — POM pays for itself quickly. A useful rule of thumb: if any locator appears in more than two tests, it belongs in a page object.

Q2: How would you structure a large Playwright suite for a team of five engineers?

Model answer

I’d organise it into three layers: page objects under src/pages/, reusable fixtures under src/fixtures/, and test specs under tests/ mirroring the application’s feature areas. Shared utilities — auth helpers, API clients, test data builders — live in src/helpers/. Each engineer owns a feature area, so merge conflicts are rare. The playwright.config.ts defines multiple projects — chromium, webkit, and a mobile viewport — so cross-browser coverage is declarative rather than copy-pasted. I’d enforce no test depending directly on another test’s state: each spec must set up and tear down its own data. In practice this means using API calls to seed data rather than driving the UI into a known state before every test, which also halves execution time. Code review for test PRs follows the same standards as production code — no raw selectors in spec files, no magic strings, no skipped tests without a linked issue in Jira or GitHub.

// src/fixtures/auth.fixture.ts import { test as base, Page } from '@playwright/test'; import { LoginPage } from '../pages/LoginPage'; type AuthFixtures = { authenticatedPage: Page; }; export const test = base.extend<AuthFixtures>({ authenticatedPage: async ({ page }, use) => { const loginPage = new LoginPage(page); await loginPage.goto(); await loginPage.login(process.env.TEST_USER!, process.env.TEST_PASS!); await use(page); // teardown happens automatically after use() }, });

Q3: Explain the test pyramid and how it guides automation investment decisions.

Model answer

The test pyramid (Mike Cohn) describes three tiers: a wide base of fast unit tests, a middle layer of integration/API tests, and a narrow top of E2E UI tests. The proportions reflect cost-to-value ratios: unit tests run in milliseconds and are cheap to fix; E2E tests take seconds each, depend on browser state, and fail for reasons unrelated to the code under test (network, timing, third-party widgets). In practice at a NZ fintech like Pushpay, the pyramid means: unit tests cover all business logic in pure functions; API tests cover the contract between services; and E2E tests cover only the critical user journeys that can’t be verified any other way — checkout, payment submission, login with RealMe. The pyramid breaks down when teams skip unit tests and automate everything at the E2E layer — this is the “ice cream cone anti-pattern” and it produces suites that are slow, flaky, and expensive to maintain. When advising a team on where to add automation, I always ask: “Is there a lower-level test that would catch the same defect faster?”

Q4: How do you decide what to automate versus what to leave as manual testing?

Model answer

The decision comes down to four factors: frequency, stability, value, and cost-to-automate. If a test runs every sprint, covers stable functionality, catches regressions that cost a lot to miss, and can be expressed as a deterministic assertion — automate it. If a test runs once for a release milestone, depends on highly volatile UI, requires human judgement about visual correctness or usability, or involves a workflow that changes every two weeks — keep it manual. Exploratory testing is inherently manual because it relies on human curiosity to find unexpected defects; no script can replicate that. In NZ, accessibility testing with assistive technologies also sits in the manual column: automated tools like axe catch ~30% of WCAG issues, but testing with a screen reader against the Web Accessibility Standards 2.1 requires a human. My rule of thumb: automate the regression suite for confidence; keep human testers focused on exploration, new features, and risk areas the automation can’t reach.

Q5: How do you manage test data in an automation suite?

Model answer

Test data strategy depends on the environment. For unit and integration tests I use builders or factories that generate data in memory — no external dependencies, no shared state. For E2E tests against a shared staging environment, I favour API-driven setup: before each test, I call the backend API to create the required entities (a customer, an order, a payment record), run the test, then clean up via API after. This is faster than driving the UI to a known state and avoids test interdependency. For data that must persist across a suite run — reference data, configuration — I maintain it in seed scripts that can be re-run idempotently. I avoid shared static test accounts because they cause race conditions when tests run in parallel. In NZ healthcare or government projects with Privacy Act 2020 obligations, production data is never used in test environments; synthetic data generators or anonymised snapshots are required, and that constraint shapes the whole test data strategy from day one.

// src/helpers/data-builder.ts export async function createTestCustomer(request: APIRequestContext) { const response = await request.post('/api/customers', { data: { name: `Test Customer ${Date.now()}`, email: `test+${Date.now()}@example.nz`, region: 'Auckland', }, }); const customer = await response.json(); return customer; // caller is responsible for cleanup }

Q6: What makes a good test fixture, and how do you avoid fixture bloat?

Model answer

A good test fixture sets up exactly the state a test needs — no more. Fixture bloat happens when teams create one massive shared fixture that sets up the entire application, then every test inherits state it doesn’t need. This slows down the suite and creates hidden coupling: changing the fixture breaks tests that have nothing to do with the change. In Playwright, fixtures compose cleanly because they’re lazy — a fixture only runs if a test requests it. I split fixtures by concern: an authenticatedPage fixture handles login; a testCustomer fixture creates and destroys a customer record; a productWithInventory fixture sets up a product with stock. Tests compose just what they need. The discipline is to never put two unrelated concerns in the same fixture. If a test needs both an authenticated session and a test order, those are two separate fixtures that compose rather than one monolithic setup block. Code-review feedback for fixtures follows the same rule: “does this setup serve exactly one concern?”

2 Playwright-Specific

Playwright is the dominant E2E tool in NZ automation roles as of 2025–2026. Expect deep technical questions, not surface-level “what is Playwright” questions.

Q7: Why would you choose Playwright over Selenium for a new project?

Model answer

Playwright has four decisive advantages over Selenium for most modern web applications. First, auto-waiting: Playwright waits for elements to be actionable before interacting — no explicit waitForElement calls scattered through the codebase, which eliminates a major source of flakiness. Second, network interception built in: mocking API responses, testing offline states, and stubbing slow endpoints require no extra libraries. Third, parallel execution across browser contexts within a single process — Selenium requires separate driver instances, making parallelism more complex and resource-heavy. Fourth, the trace viewer gives a full timeline of every action, screenshot, and network request without any additional configuration, which is invaluable when debugging failures in GitHub Actions at 2am. Selenium still makes sense for legacy suites with years of investment, Java-centric teams, or when you need to test Internet Explorer (rare but exists in some NZ government procurement contexts). For a greenfield project with a team comfortable with TypeScript, Playwright is the clear choice.

Q8: How do you handle dynamic elements — for example, a table that loads asynchronously and whose row count varies between runs?

Model answer

The key is to wait for a specific condition rather than a fixed count or a raw timeout. Playwright’s waitFor and expect(locator).toHaveCount() with a timeout assertion are the right tools — they poll until the condition is met or the timeout expires. For a table that loads asynchronously, I’d wait for the loading spinner to disappear (or a data attribute indicating load is complete) before asserting on the row count. If the row count is genuinely variable because it depends on database state, I’d assert on a minimum (“at least one row exists”) or set up the test data such that I know exactly how many rows to expect. I avoid page.waitForTimeout() almost entirely — it’s a smell that usually indicates a missing waitFor condition. Where I do use it is in a very small number of visual regression tests where I need a CSS animation to complete before a screenshot, and even then I prefer page.waitForFunction() against an animation state property.

// Wait for async table to load, then assert await expect(page.locator('[data-testid="loading-spinner"]')).toBeHidden(); await expect(page.locator('table tbody tr')).toHaveCount(3, { timeout: 10_000 }); // Or: wait for at least one row if count is variable await expect(page.locator('table tbody tr').first()).toBeVisible();

Q9: How do you run Playwright tests in parallel, and what are the gotchas?

Model answer

Playwright runs spec files in parallel by default using worker processes; tests within a single spec file run serially. Increasing workers in playwright.config.ts speeds things up but exposes any test data coupling. The most common gotcha is shared test accounts: if two parallel workers both log in as the same user and one test modifies that user’s state, the other test sees unexpected state and fails intermittently. The fix is isolated test data — each worker creates its own user or entity via API before the test runs. The second gotcha is database sequence numbers or auto-increment IDs: tests that assert on specific IDs will collide in parallel. Use stable identifiers like email addresses or names with a timestamp suffix. A third gotcha is test environment capacity: running 10 workers against a staging environment that can only handle 5 concurrent sessions will cause timeouts. I usually set workers: 4 for CI and workers: 2 for a shared staging environment, confirmed by load testing the environment first.

// playwright.config.ts export default defineConfig({ workers: process.env.CI ? 4 : 2, fullyParallel: true, use: { baseURL: process.env.BASE_URL ?? 'https://staging.example.nz', }, projects: [ { name: 'chromium', use: { ...devices['Desktop Chrome'] } }, { name: 'webkit', use: { ...devices['Desktop Safari'] } }, ], });

Q10: A Playwright test that passes locally fails in CI. Walk me through how you debug it.

Model answer

My first step is always the trace: download the .zip artifact from the CI run and open it with npx playwright show-trace. The trace gives me a timeline of every action, screenshot at each step, network requests, and console errors — usually the failure is obvious within 30 seconds. If the trace shows the page is blank or loading slowly, the culprit is likely a timing difference: CI environments have slower CPU and network than my MacBook. I check whether the failing assertion has a generous enough timeout and whether it depends on a network request that takes longer in CI. If the page content is different from local, I check environment variables — is the CI run hitting the right base URL, the right API endpoint? The third category is font or rendering differences that affect visual regression tests: I solve this by running screenshot tests headlessly in Docker locally, matching the CI container image exactly. I also enable screenshot: 'only-on-failure' and video: 'retain-on-failure' in CI config so I always have evidence without having to re-run.

Q11: When do you use the Playwright Trace Viewer versus other debugging approaches?

Model answer

The Trace Viewer is my first choice when debugging a CI failure or a test I can’t reproduce locally, because it gives me the exact state of the page at every step without needing to re-run anything. It’s also invaluable for intermittent failures: I enable trace: 'on-first-retry' so I get a trace only when the test actually fails after a retry, not on every run. For local debugging of a test I’m actively writing, I use --ui mode, which gives a live interactive view and lets me re-run individual tests with time-travel debugging. For understanding what network requests a page makes — useful for mocking or for diagnosing third-party failures — I use page.on('request', ...) listeners in a temporary diagnostic script. page.pause() drops me into the inspector mid-test, which is useful when I want to interact with the page manually at a specific point. The rule: Trace Viewer for post-mortem analysis; UI mode for active development; page.pause() for pinpointing a specific step.

Q12: How do you set up Playwright in a CI pipeline from scratch?

Model answer

The setup involves four steps: install dependencies and browsers, run the tests headlessly, upload artifacts on failure, and publish a report. Using GitHub Actions, I add a workflow file that caches node_modules and the Playwright browser binaries separately — browsers are large and slow to download; caching them cuts CI time from 3 minutes to 30 seconds. I use the official mcr.microsoft.com/playwright Docker image, which ships with all browser dependencies pre-installed, eliminating the “missing library on Ubuntu” class of failures. Test results go to a JUnit XML report that GitHub Actions parses into the PR checks summary; traces and videos upload as artifacts on failure. For NZ teams using Buildkite (common at Xero and Trade Me), the pattern is the same but the YAML syntax differs. I set PWTEST_SKIP_TEST_OUTPUT=1 to keep CI logs readable and only attach the full trace file when a test actually fails.

// .github/workflows/playwright.yml (excerpt) - name: Install Playwright browsers run: npx playwright install --with-deps chromium webkit - name: Run Playwright tests run: npx playwright test env: BASE_URL: ${{ secrets.STAGING_URL }} TEST_USER: ${{ secrets.TEST_USER }} TEST_PASS: ${{ secrets.TEST_PASS }} - name: Upload trace on failure if: failure() uses: actions/upload-artifact@v4 with: name: playwright-report path: playwright-report/ retention-days: 7

3 API Testing

SDET roles in NZ increasingly expect you to test APIs in code, not just via Postman GUIs. These questions probe whether you understand API testing at the engineering level.

Q13: How do you test REST APIs with code rather than a GUI tool?

Model answer

I use either Playwright’s built-in request context or a dedicated library like supertest (for Node.js backends) or axios with Jest. The advantage of code over a GUI tool is that tests live in the repository, run in CI without human intervention, and can assert on the full response structure using typed schemas. A typical API test in Playwright sends a request, asserts on the HTTP status code, validates the response body against a schema (using Zod or similar), and checks key fields by value. I also test edge cases: empty bodies, malformed JSON, missing required fields, fields outside valid ranges. For APIs that require authentication, I obtain a JWT in a beforeAll block and attach it to subsequent requests — I don’t hardcode tokens. I version tests alongside the API code so they’re reviewed in the same PR, making it clear what the contract is supposed to be. In NZ SaaS products this often integrates with OpenAPI specs: I generate schema validation from the openapi.yaml so the tests always reflect the current documented contract.

// Playwright API test test('POST /api/orders returns 201 with valid payload', async ({ request }) => { const response = await request.post('/api/orders', { headers: { Authorization: `Bearer ${token}` }, data: { customerId: 'cust_123', items: [{ sku: 'ABC', qty: 2 }] }, }); expect(response.status()).toBe(201); const body = await response.json(); expect(body).toMatchObject({ status: 'pending', customerId: 'cust_123' }); expect(body.orderId).toMatch(/^ord_/); });

Q14: What is contract testing and when would you use it?

Model answer

Contract testing verifies that a consumer and a provider agree on the shape of an API interaction, without requiring both services to run simultaneously. The consumer writes a contract — “I expect the payments service to return an object with orderId, status, and amount” — and the provider verifies it can fulfil that contract. Pact is the most common framework for this in NZ teams using microservices. The use case is microservice environments where integration tests are expensive or slow: instead of spinning up five services for every PR, each service tests its own side of the contract independently. I’d reach for contract testing when: the team has more than two services communicating via HTTP or Kafka; when E2E environment availability is a bottleneck; or when a downstream service team wants to refactor without breaking consumers. Contract testing does not replace integration tests for complex business flows — it ensures the interface is stable, not that the business logic is correct. At a NZ insurer with 15 microservices, contract testing cut the time-to-detect breaking changes from days (waiting for E2E to run) to minutes.

Q15: Explain the difference between a mock and a stub in the context of API testing.

Model answer

A stub is a simplified replacement for a dependency that returns a pre-defined response without caring how it’s called. A mock is a stub with expectations: it not only returns a canned response but also verifies that it was called in the expected way — correct endpoint, correct payload, correct number of times. In practice: if I want to test that my checkout service handles a 402 response from a payment gateway, I use a stub that always returns 402 — I don’t care about verifying the call itself, just the behaviour under that condition. If I want to verify that placing an order always triggers exactly one call to the notification service with the correct customer ID, I use a mock that asserts on invocation count and arguments. The confusion arises because many testing frameworks use these terms loosely or interchangeably. In Playwright, page.route() is effectively a stub for network requests; test frameworks like Jest have jest.fn() which can act as either. When I discuss this in an interview I clarify: “mock for verifying behaviour, stub for controlling state.”

Q16: How do you test authentication in API tests without hardcoding credentials?

Model answer

Credentials come from environment variables injected at runtime, never from the codebase. In CI, they’re stored as GitHub Actions secrets or in a secrets manager like AWS Secrets Manager or HashiCorp Vault. Locally, I use a .env.test file that’s in .gitignore. The pattern I use is a getAuthToken() helper that reads credentials from environment variables, calls the auth endpoint to exchange them for a JWT, and caches the token for the duration of the test run via a beforeAll block. I never use production credentials in test environments — I use dedicated test accounts. For OAuth 2.0 flows, I usually bypass the redirect flow in tests by calling the token endpoint directly with client credentials or a refresh token, rather than automating the browser-based login, which is brittle. For environments with SSO (common in NZ government and enterprise), I set up a service account with a long-lived API key scoped to testing permissions only, and rotate it monthly.

// src/helpers/auth.ts let cachedToken: string | null = null; export async function getAuthToken(request: APIRequestContext): Promise<string> { if (cachedToken) return cachedToken; const response = await request.post('/auth/token', { data: { username: process.env.TEST_USER!, password: process.env.TEST_PASS!, }, }); const { access_token } = await response.json(); cachedToken = access_token; return cachedToken; }

Q17: How do you test error responses in an API — for example, 400, 404, and 500 status codes?

Model answer

Each error code represents a distinct contract that clients depend on, so I test each one explicitly. For 400 (bad request), I submit payloads with missing required fields, invalid types, out-of-range values, and malformed JSON — and assert that the response body includes a machine-readable error code and a human-readable message. For 404, I request a resource with an ID that doesn’t exist and verify the response structure matches the API’s documented error envelope. For 500, I either trigger it by putting the dependency in an error state (if I control the environment) or mock the dependency to return an error and verify the API degrades gracefully rather than leaking a stack trace. Stack trace leakage in 500 responses is a security issue under OWASP and a Privacy Act concern if the trace includes personal data — I flag it as a defect, not just a test failure. I document the expected error contract alongside the test code so the next developer knows what the API is supposed to do, not just what it currently does.

4 CI/CD & Quality Gates

SDET roles own the pipeline, not just the tests. Expect questions about how you protect the main branch and how you make test results actionable for the whole team.

Q18: How do you integrate automated tests into a CI pipeline so they block a merge?

Model answer

The mechanism is a required status check on the main branch protection rule in GitHub (or the equivalent in Bitbucket or GitLab, both common in NZ). The CI workflow runs on every pull request and posts a status check — pass or fail — back to the PR. If the check fails, the “Merge” button stays greyed out. The workflow runs in stages: lint and type-check first (fast, fails cheap), then unit tests, then integration tests, then E2E tests. Each stage only runs if the previous stage passed, so a type error doesn’t burn 10 minutes of E2E time. I also run E2E tests in a deployment preview environment (Vercel, Netlify, or a staging slot) rather than against the main branch, so tests run against the actual build artifact being reviewed. For NZ teams using Buildkite, I use pipeline steps with soft_fail for flaky tests under investigation and hard fail for stable tests. Test results are surfaced directly in the PR via a GitHub Actions test reporter, so developers see which specific tests failed without leaving GitHub.

Q19: What is a quality gate and what metrics would you include in one?

Model answer

A quality gate is an automated pass/fail check against a defined set of metrics that a build must satisfy before it can proceed — to merge, to deploy, or to release. SonarQube’s quality gate is the most common implementation in enterprise NZ (ANZ NZ, government). The metrics I’d include for an automation project: test pass rate (100% required), code coverage on new code (typically 80% line coverage, configurable), zero critical or blocker issues from static analysis, no new security vulnerabilities (checked against OWASP or Snyk), and Playwright test execution time under a defined threshold (so the pipeline doesn’t silently degrade). I also include a flakiness check: if any test has failed and passed in the same 7-day window without a code change, it’s flagged for investigation before the PR can merge. Quality gates should be owned by the team, not imposed by QA alone — the metrics need to reflect what the team genuinely cares about, or they’ll be gamed or bypassed.

Q20: How do you handle flaky tests in CI without just deleting or skipping them?

Model answer

Flaky tests are a symptom, not a bug in themselves — they indicate a real problem that intermittently occurs. My approach is: quarantine, diagnose, fix, graduate. When a test becomes flaky I move it to a @quarantine tag that runs on a separate non-blocking pipeline, so it stops polluting the main suite pass rate. I open a ticket immediately with the failure evidence. Then I diagnose: is it a timing issue (add proper wait conditions), a test data collision (isolate data per test), an environment issue (fix the environment), or a genuine race condition in the application (raise with dev team)? I enable Playwright’s retries: 2 on CI only — a test that passes on retry is still flagged as flaky in the report. I do not use retries to mask genuine failures. Once the root cause is fixed and the test passes 20 consecutive times across different conditions, it graduates back to the main suite. I track a flakiness rate metric over time: more than 5% of tests flaking in a 30-day window is a signal the strategy needs review.

Q21: How do you publish test results as PR annotations so developers see failures in-context?

Model answer

GitHub Actions natively supports test annotations when a workflow uploads a JUnit XML report and uses a reporter action. I configure Playwright to emit a JUnit report alongside the HTML report: reporter: [['junit', {outputFile: 'results.xml'}], ['html']]. The dorny/test-reporter GitHub Action or the native actions/upload-artifact with a test-results parser then reads the XML and posts failing test names and messages as annotations directly on the PR checks tab. For inline annotations on the diff (pointing at specific lines of test code), I use a custom action or the GitHub Checks API to post annotations with file, line, and message fields. This means a developer sees “LoginPage.spec.ts line 42: expect(heading).toHaveText(‘Dashboard’) — received: ‘Login’” directly in the PR without downloading logs. In NZ teams using Buildkite, the junit plugin provides the same feature. The goal is to make test failures zero-friction to diagnose — if reading a failure requires navigating to a separate reporting tool, developers will skip it.

5 Code Quality

Test code is production code. These questions probe whether you treat your automation suite with the same rigour you’d apply to application code.

Q22: How do you keep test code maintainable as the application evolves?

Model answer

Maintainability comes from the same principles that apply to application code: single responsibility, DRY, meaningful names, and layered abstraction. The biggest maintainability risk in automation is selector brittleness — tests that break every time a developer renames a CSS class or restructures the DOM. I solve this by advocating for data-testid attributes on interactive elements: they’re stable by convention, don’t change with styling or refactoring, and signal intent to future developers. I enforce this as a team norm via a linting rule that flags any Playwright locator using a CSS class or XPath that doesn’t include a semantic attribute. Beyond selectors, I require all page interaction logic to live in page objects — if a spec file contains a page.locator() call, that’s a code review failure. Tests should read like user stories, not like implementation details. I also schedule quarterly “automation health” sessions where the team reviews the suite for dead tests (covering features that no longer exist), duplicated coverage, and tests that consistently take more than 30 seconds.

Q23: Explain the DRY principle in the context of test automation and where it can be taken too far.

Model answer

DRY (Don’t Repeat Yourself) in tests means: shared locators live in page objects, shared setup logic lives in fixtures, shared assertions live in custom matchers. Repeating the same page.locator('#submit-btn') in 30 tests is a maintenance tax — one button rename breaks 30 tests. However, DRY can be taken too far in tests in a way that doesn’t apply to production code. Tests benefit from being explicit and readable even at the cost of some repetition. If abstracting the assertion into a shared helper makes the test harder to read — “what exactly is assertOrderConfirmed() checking?” — that abstraction is too deep. The principle I use: DRY the how (interactions, selectors, setup), but allow some repetition in the what (assertions, expected values) if it makes each test self-documenting. A test should be understandable by someone who has never seen the codebase before, in under two minutes. If it requires reading three levels of helper functions to understand what it’s testing, the abstraction has gone too far.

Q24: How do you approach code review for automation PRs?

Model answer

I treat automation code reviews with the same rigour as production code reviews, but with a checklist tuned to test-specific concerns. I check: does the test name describe what it verifies, not how? Does the test set up its own data and clean up after itself? Are all locators in page objects rather than scattered in spec files? Are there any waitForTimeout calls (smell for timing hacks)? Are there any hardcoded credentials or environment-specific values? Does the test fail for the right reason — i.e., would it catch the defect it’s meant to catch? That last point matters: I occasionally ask PR authors to demonstrate their test fails when the feature is broken (by temporarily commenting out the functionality). A test that passes regardless of the application state is worse than no test. I also flag tests that are too broad (multiple unrelated assertions in a single test) and tests that are too coupled (relying on a previous test to have run). Good automation code review is a teaching opportunity, not a gatekeeping exercise.

Q25: What makes a test unreliable, and how do you prevent it from the start?

Model answer

Unreliable tests share a small set of root causes: shared mutable state (tests that depend on a previous test having run, or on a database that another test modified concurrently), timing assumptions (using sleep(1000) instead of waiting for a condition), environment sensitivity (hardcoded ports, base URLs, or time zones that differ between local and CI), and non-deterministic data (assertions on auto-incremented IDs or timestamps without proper tolerances). Prevention starts with design principles enforced in code review: every test creates its own data via API before running and deletes it after; no sleeps, only Playwright’s auto-waiting and explicit waitFor conditions; all environment-specific values come from environment variables; no test asserts on an auto-incremented ID. I also enable Playwright’s forbidOnly: true in CI so no developer can accidentally commit a test.only() that causes the rest of the suite to be skipped. A reliable test is one where the only reason it fails is that the application behaviour has changed — every other source of failure is infrastructure noise that erodes trust in the suite.

6 Technical Scenarios

These are the “tell me how you’d solve this” questions. There’s no single right answer — interviewers are evaluating your reasoning process, trade-off awareness, and whether you ask clarifying questions before diving in.

Q26: Your Playwright suite takes 45 minutes to run in CI. How do you fix it?

Model answer

First I’d profile the suite to understand where the time goes: Playwright’s built-in HTML reporter shows per-test duration, so I can identify the slowest 20% of tests quickly. If a handful of tests account for most of the time, I look at what they’re doing: are they driving the UI through multi-step setup flows that could be replaced by direct API calls? Can I parallelise more aggressively by increasing workers or distributing across multiple CI runners using Playwright’s sharding feature? Sharding across four runners in parallel can cut 45 minutes to 12 minutes without changing a single test. Beyond sharding, I question test layering: are there E2E tests covering scenarios that a faster API test or unit test could cover at a fraction of the cost? Moving 30% of the suite down the pyramid is often the highest-impact change. I’d also check for unnecessary network requests: tests that call real third-party APIs (email, payment, SMS) can be 10× slower than tests that mock those calls. Finally, I’d ensure browser instances are reused across tests where appropriate via storageState for authenticated sessions, avoiding re-login on every test.

// playwright.config.ts — sharding across 4 CI runners // Runner 1: npx playwright test --shard=1/4 // Runner 2: npx playwright test --shard=2/4 // ... export default defineConfig({ workers: 4, use: { storageState: 'playwright/.auth/user.json', // reuse login session }, });

Q27: A test is failing intermittently — roughly 1 in 5 runs. Walk me through how you debug and resolve it.

Model answer

A 20% failure rate is almost certainly a race condition or shared state issue, not a genuine application defect. My first step is to look at every failure instance and check whether the failure message is consistent: if the assertion is the same each time, it’s likely a timing issue in the test; if the message varies (sometimes element not found, sometimes wrong value), it suggests shared state pollution from a parallel test. I enable trace: 'on-first-retry' and trigger 10 runs to collect traces from actual failures. In the trace, I look at the network tab: is the test asserting on data that arrives after the assertion fires? If so, I need a more explicit wait condition. I also check whether the test creates its own data or depends on a shared resource — if two workers are touching the same database record, I need to isolate the data. I then run the test in a loop locally (npx playwright test --repeat-each=20) to try to reproduce it in isolation. If it only fails in parallel, that confirms a shared resource issue. I don’t release the fix until the test passes 30 consecutive times in CI with retries: 0.

Q28: You’re asked to add automation to a legacy system that has no APIs and a non-standard UI framework. How do you approach it?

Model answer

Before writing a single line of automation, I’d assess the risk-to-effort ratio. Legacy systems with no APIs and non-standard UI frameworks (older Flex, WinForms, or heavily custom JavaScript rendering) are among the hardest things to automate reliably. The first question I’d ask the business: what problem are we trying to solve? If the goal is regression confidence before a migration, a narrowly scoped suite covering the 5–10 most critical user journeys may be the right investment. If the goal is broad coverage of a system that’s being decommissioned in 6 months, the ROI calculation probably doesn’t favour automation at all. If we proceed, I’d look at what test hooks are available: can I interact with the DOM, or is it a canvas-based UI? Does the app expose any accessibility attributes (ARIA roles) I can target? Can I add data-testid attributes in a controlled way? If the system is truly opaque, I’d consider visual regression testing as a lightweight safety net. I’d also advocate for investing in adding an API layer during the stabilisation work — even a thin REST wrapper over the database unlocks far better test coverage than UI automation alone.

Q29: The development team is resistant to investing time in test automation. How do you make the case?

Model answer

The case for automation needs to be in the language the team already cares about — cycle time, release frequency, and incident rate, not “testing best practice.” I start by gathering current data: how many hours per sprint does the team spend on manual regression? How many production incidents have been caused by regressions that a regression suite would have caught? How many releases have been delayed by late-discovered bugs? Once I have the numbers, I propose a small, targeted pilot: automate the top 5 critical user journeys, measure the time saved in the next three sprints, and let the data make the argument. I also frame automation as a developer benefit, not a QA initiative — developers get faster feedback on their own PRs (minutes, not days), and they can confidently refactor knowing the suite will catch regressions. In NZ contexts where teams work across multiple time zones (common at Xero, Datacom, or companies with Indian or Australian teams), automation enables async quality assurance: the suite runs overnight and the NZ morning standup starts with a clear green/red signal. That framing resonates with engineering leads who have been burned by “we thought it was fine” deployments.

Q30: You’ve been asked to build an automation framework from scratch for a new NZ SaaS product. Walk through your approach.

Model answer

Week one: discovery and decisions. I talk to the team to understand the tech stack, release cadence, and risk profile. I settle on tooling (Playwright with TypeScript for E2E, Jest for unit), agree on the test data strategy (API-driven setup, Privacy Act 2020 compliance for any customer data), and decide on CI platform (GitHub Actions, unless the team already uses Buildkite). I set up the skeleton in the first PR: folder structure, playwright.config.ts, a single passing smoke test, and the GitHub Actions workflow. Getting CI running from day one means the team sees green from the start. Week two: foundations. I write the first page objects for the most-used flows (login, core navigation), the auth fixture, and the API helper. I document the conventions in a TESTING.md in the repository — naming rules, locator strategy, data isolation approach — so any developer can contribute tests without a briefing from me. Week three onward: coverage expansion. I prioritise by risk: payment flows and auth before edge-case UI. I also schedule a demo with the team after the first two weeks so they see the value early and start contributing. The biggest mistake I’d avoid is building a sophisticated framework that only the SDET understands — the framework should be accessible enough that a developer can write a test for their own feature without specialised knowledge.

// Project structure from day one tests/ auth/ login.spec.ts checkout/ payment-flow.spec.ts src/ pages/ LoginPage.ts CheckoutPage.ts fixtures/ auth.fixture.ts testData.fixture.ts helpers/ api.ts // API client for test data setup dataBuilder.ts // factory functions for test entities playwright.config.ts TESTING.md // team conventions — locators, naming, data strategy