BDD & Three Amigos · Lesson 2

Writing Good Gherkin

The agreed examples from a Three Amigos session are only useful if they are written down precisely. Gherkin is the plain-language grammar for doing that — readable enough for a business owner, exact enough to automate. This lesson teaches the craft of writing it well.

BDD & Three Amigos Behaviour-Driven Development — Lesson 2 of 2 ~30 min read · ~75 min with exercises

1 The Hook

A team at a fictional council, Manawa District, automated their rates-payment scenarios in Gherkin. A year later a tester opened the feature file to understand how a part-payment was meant to behave. This is what greeted her:

Scenario: Test part payment

  Given I navigate to "https://rates.manawa.govt.nz/login"

  And I type "ratepayer01" into the field with id "txtUser"

  And I type "Passw0rd!" into the field with id "txtPass"

  And I click the button with id "btnLogin"

  And I click the link with text "Pay rates"

  And I type "150" into the field with id "txtAmount"

  And I click "btnSubmit"

  Then I see "Thank you"

She still had no idea what the rule was. Was $150 a part-payment of a larger bill? What was the bill? What should happen to the balance? The scenario described clicking a website, not the behaviour of part-payments. It was tied so tightly to the screen that when the login page was redesigned six months later, every scenario in the file broke at once — not because the rates logic changed, but because a button id did. The feature file documented the UI of a moment, and nothing about the rules.

Good Gherkin is the opposite of this. It describes what the system does in the language of the business, not how a user clicks through a screen. Written well, the same scenario survives a UI redesign untouched, reads like a sentence a ratepayer would understand, and tells the next tester exactly what the rule is. This lesson is about writing that kind of Gherkin — and recognising the kind above so you never ship it.

2 The Rule

Gherkin describes behaviour, not clicks. A scenario states a context (Given), an action (When), and an outcome (Then) in the language of the business — never the language of the user interface. If a scenario mentions buttons, fields, or URLs, it is documenting the screen, and it will break when the screen changes while telling you nothing about the rule.

3 The Analogy

Analogy

A recipe versus a video of someone’s hands.

A good recipe says “cream the butter and sugar until pale.” It tells you the intent, so you can do it with any bowl, any whisk, in any kitchen. A bad recipe would be a silent video of one particular person’s hands in one particular kitchen — “move the yellow handle 14 times, then reach to the left drawer.” The moment you stand in a different kitchen, the instructions are useless, even though the cake is identical.

Imperative, UI-coupled Gherkin is the video of the hands — click this exact button, in this exact field. Declarative Gherkin is the recipe — “the ratepayer makes a part-payment of $150.” The recipe survives a new kitchen the way good Gherkin survives a redesigned screen. Write the recipe, not the video.

4 The Anatomy of a Scenario

Gherkin has a small, fixed grammar. A Feature groups related scenarios. Each Scenario is one concrete example, built from steps that start with keywords:

Given — the context that is already true before the action. The starting state of the world. (“Given a ratepayer with an outstanding rates bill of $1,200”.)
When — the single action under test. The event that triggers the behaviour. (“When the ratepayer pays $150”.)
Then — the expected outcome. What must be observably true afterwards. (“Then the outstanding balance is $1,050”.)
And / But — extra Given, When, or Then steps, so the scenario reads naturally instead of repeating a keyword.

Here is the Manawa part-payment scenario from the Hook, rewritten as good Gherkin:

Feature: Rates part-payment

Scenario: A part-payment reduces the outstanding balance

  Given a ratepayer with an outstanding rates bill of $1,200

  When the ratepayer pays $150

  Then the outstanding balance is $1,050

  And the payment is recorded as a part-payment

Read it aloud — it is a sentence a ratepayer would understand, and it states the rule (a $150 payment reduces a $1,200 bill to $1,050) with no mention of a button, a field, or a URL. One scenario should test one behaviour: one Given context, one When action, one or a few Then outcomes. If you find yourself writing a second When in the middle, you almost certainly have two scenarios.

Pro tip: A single “When” per scenario is the strongest rule of thumb in Gherkin. The When is the one action you are testing. Two Whens means two actions, which means two behaviours, which means two scenarios — split them.

5 Declarative vs Imperative

This is the single most important distinction in writing Gherkin, and the one the Hook turned on.

Imperative steps describe how a user operates the interface — the clicks, the fields, the keystrokes. Declarative steps describe what the user is trying to do, in business language, leaving the “how” to the automation underneath.

Imperative (avoid):
When I type "0800123456" into the field "txtPhone"
And I select "Auckland" from the dropdown "ddlRegion"
And I click the button "btnContinue"

Declarative (prefer):
When the applicant submits their contact details for Auckland

The declarative version is shorter, reads like business, survives a UI change (the dropdown can become radio buttons and the scenario is untouched), and tells the reader the intent rather than the mechanics. The imperative version breaks the moment a field id changes, buries the rule under clicks, and is unreadable to anyone who is not looking at the screen. The “how” — which field, which button — belongs in the automation step definitions, hidden beneath the Gherkin, never in the scenario itself.

Pro tip: Quick test — read your scenario to someone who has never seen the screen. If they understand what behaviour is being checked, it is declarative. If they say “what is txtPhone?”, it is imperative and coupled to the UI.

6 Backgrounds

When every scenario in a feature shares the same setup, repeating it in each one is noise. A Background is a block of Given steps that runs before every scenario in the feature, so the common context is stated once.

Feature: KiwiSaver contribution rate change

Background:

  Given a member with an active KiwiSaver account

  And the member is currently contributing at 3%

Scenario: Member raises their rate to a valid value

  When the member changes their rate to 8%

  Then the new rate of 8% applies from the next pay run

Scenario: Member enters an invalid rate

  When the member changes their rate to 5%

  Then the change is rejected with the list of valid rates

The shared context — an active account contributing at 3% — is declared once in the Background and applies to both scenarios. Keep Backgrounds short and made only of Given steps that genuinely apply to every scenario. The moment a Background grows long, or you find yourself adding a Given that only some scenarios need, it has become a liability — it hides important context from the reader, who now has to scroll up to understand any single scenario.

Pro tip: Never put a When or a Then in a Background. A Background is shared context, not a shared action or outcome. If you feel the urge, the action belongs in each scenario where it is the thing under test.

7 Scenario Outlines

When the same behaviour needs checking against several sets of values, writing one scenario per value is repetitive. A Scenario Outline states the behaviour once with placeholders, and an Examples table supplies the rows. It is Specification by Example from Lesson 1, written as runnable Gherkin.

Here is the Auckland Council rates-rebate rule from Lesson 1, including the boundary case the Three Amigos had to decide:

Scenario Outline: Rates rebate by income and dependants

  Given a ratepayer with an income of <income> and <dependants> dependants

  When their rebate is assessed

  Then the rebate is <rebate>

  Examples:

    | income  | dependants | rebate |

    | 32000   | 0          | $290   |

    | 32000   | 2          | $410   |

    | 45000   | 0          | $0     |

    | 39000   | 0          | $290   |

The outline runs once per row, substituting each <placeholder>. Four data cases, one readable scenario. This is where boundary and negative rows from a Three Amigos session land naturally — the row at exactly the threshold, the row that produces $0. A scenario outline is the right tool when one behaviour varies by data; it is the wrong tool when you are tempted to cram several different behaviours into one table, which makes it unreadable.

Pro tip: Each Examples row should exercise the same behaviour with different data. If one row needs a different Then than the others, that row is a different behaviour — give it its own scenario rather than bending the outline to fit.

8 Gherkin as Living Documentation

The hidden payoff of good Gherkin is documentation that cannot go stale. Because the scenarios are executed as automated tests, they must match the system’s real behaviour or the build fails. That makes the feature files a description of the system that is always true — living documentation.

Contrast that with a Word specification in a shared drive. It is accurate the day it is written and decays from there, because nothing forces it to stay in step with the code. Six months on, no one trusts it. A Gherkin feature file cannot drift the same way: if the behaviour changes and the scenario is not updated, the automated test breaks and someone has to reconcile them. The documentation and the system are chained together.

This is only true if the Gherkin is declarative and business-readable. The imperative click-script from the Hook documents nothing a business owner can use. Declarative scenarios — “a $150 payment reduces a $1,200 bill to $1,050” — can be read by a BA, a product owner, or an auditor, and they double as the test suite. One artefact serves as the requirement, the test, and the documentation at once. That is the whole promise of BDD, and it stands or falls on how you write the Gherkin.

Pro tip: A good test of living documentation: could a new ratepayer-services manager read your feature file and learn the rates rules without asking anyone? If yes, it is living documentation. If they need a developer to translate it, it is just automated clicking.

9 Common Mistakes

🚫 Writing imperative, UI-coupled steps

Why it happens: Recording or transcribing clicks feels concrete and is easy to automate first time.
The fix: Steps full of fields, buttons, and URLs break the moment the screen changes and hide the rule. Write declarative steps in business language — “the ratepayer makes a part-payment of $150” — and keep the click mechanics down in the step definitions, out of the scenario.

🚫 Putting more than one behaviour in a scenario

Why it happens: It feels efficient to test login, then payment, then a receipt all in one scenario.
The fix: One scenario, one behaviour, ideally one When. If you have two Whens, you have two behaviours — split them, so a failure points at exactly one rule and the scenario stays readable.

🚫 Overloading the Background

Why it happens: Anything shared by two scenarios gets pushed up into the Background to avoid repetition.
The fix: A long Background, or one holding context only some scenarios need, hides important state from the reader. Keep Backgrounds to a few Given steps that truly apply to every scenario, and never put a When or Then there.

🚫 Writing Gherkin alone, after the code is built

Why it happens: The team treats Gherkin as a test-automation format rather than the record of a Three Amigos conversation.
The fix: Gherkin written solo after the fact just describes what was built — it can never catch a requirement gap, because the requirements are already set in code. The scenarios should come from the agreed examples in the Three Amigos session, before the build.

10 Now You Try

Three graded exercises: spot the anti-patterns in bad Gherkin, rewrite it declaratively, then build a scenario outline with edge and negative cases. Write your answer, run it for AI feedback, then compare to the model answer.

🔍 Exercise 1 of 3 — Spot the Anti-Patterns

The scenario below, for a fictional ANZ online transfer, is riddled with Gherkin anti-patterns. Identify at least 4 distinct problems and name each one (for example: imperative/UI-coupled, multiple behaviours, missing When, vague Then, UI selectors).

    Scenario: Test transfer

      Given I go to "https://anz.co.nz/login"

      And I type "user99" in "#username" and "Pass123" in "#pwd"

      And I click "#loginBtn"

      And I click the "Transfer" tab

      And I type "500" in "#amount" and click "#submit"

      Then it works

      And I log out

List at least 4 anti-patterns and name each:

Show model answer

There are at least six problems; any four correctly named earn full marks.

1. Imperative / UI-coupled steps — the scenario describes typing into "#username", "#pwd", "#amount" and clicking "#loginBtn", "#submit". These break when the UI changes and hide the rule. They belong in step definitions, not the scenario.

2. URLs and selectors in steps — "https://anz.co.nz/login" and CSS ids (#username) couple the scenario to one exact screen.

3. Login noise / multiple behaviours — logging in is not the behaviour under test (a transfer is). The login should be a declarative Given ("Given a logged-in customer") or a Background, not five click steps. Logging out at the end is a second unrelated behaviour.

4. Vague Then — "Then it works" asserts nothing checkable. A good Then states the observable outcome: the balance decreased by $500, the payee was credited, a confirmation reference was issued.

5. No clear single When — the action is buried in "type 500 and click submit"; the real When ("the customer transfers $500 to a saved payee") is never stated cleanly.

6. Unclear context — there is no Given establishing the starting balance or the destination account, so the outcome cannot be verified.

Strong answers name the imperative/UI-coupling and the vague "it works" Then as the two most damaging. The fix is Exercise 2.

🔧 Exercise 2 of 3 — Rewrite It Declaratively

Rewrite the broken ANZ transfer scenario from Exercise 1 as good declarative Gherkin. Use a clear Given/When/Then, business language with no UI selectors, exactly one behaviour (the transfer), and a checkable Then. Add a second scenario for a negative case — an attempted transfer that exceeds the available balance.

Show model answer

Feature: Transfer between accounts

Scenario: A successful transfer reduces the source balance
  Given a logged-in customer with $2,000 in their everyday account
  And a saved payee "Power Bill"
  When the customer transfers $500 to "Power Bill"
  Then the everyday account balance is $1,500
  And the payee is credited with $500
  And a confirmation reference is issued

Scenario: A transfer that exceeds the available balance is declined
  Given a logged-in customer with $300 in their everyday account
  When the customer attempts to transfer $500 to "Power Bill"
  Then the transfer is declined with an insufficient-funds message
  And the everyday account balance is unchanged at $300

What makes this strong: no URLs or selectors (the login is a single declarative Given, not five clicks); exactly one behaviour per scenario with one clear When; a checkable Then that states the balance and the outcome rather than "it works"; and a real negative case that asserts the balance is unchanged — a common gap. The "how" of clicking is left to the step definitions underneath, where it belongs.

🏗️ Exercise 3 of 3 — Build a Scenario Outline

Write a Scenario Outline with an Examples table for a fictional KiwiSaver contribution-rate change, where valid rates are 3%, 4%, 6%, 8%, and 10%. Use placeholders for the entered rate and the expected result. Include at least 5 example rows, and make sure at least two are negative or boundary cases (for example an invalid rate, or a value just outside the allowed set). Use a Background for the shared member context.

Show model answer

Feature: KiwiSaver contribution rate change

Background:
  Given a member with an active KiwiSaver account
  And the member is currently contributing at 3%

Scenario Outline: Changing the contribution rate
  When the member changes their rate to 
  Then 

  Examples:
    | rate | result                                              |
    | 4%   | the new rate of 4% applies from the next pay run    |
    | 8%   | the new rate of 8% applies from the next pay run    |
    | 10%  | the new rate of 10% applies from the next pay run   |
    | 5%   | the change is rejected with the list of valid rates |
    | 0%   | the change is rejected with the list of valid rates |
    | -2%  | the change is rejected as an invalid value          |

What makes this strong: the Background holds the shared context (active account, currently at 3%) once; the outline tests one behaviour (changing the rate) across data; valid rows (4%, 8%, 10%) confirm the happy path; and the negative/boundary rows (5% just outside the set, 0%, a negative value) force the rejection path. A weak answer lists only valid rates — it never exercises the rule that rejects invalid ones, which is exactly where the defect hides. Note each row keeps the same behaviour, with only the data and expected result changing.

11 Self-Check

Click each question to reveal the answer.

Q1: What is the difference between imperative and declarative Gherkin, and which should you write?

Imperative steps describe how a user operates the screen — the clicks, fields, and selectors. Declarative steps describe what the user is doing in business language. Write declarative: it is readable, survives a UI redesign, and states the rule. The “how” belongs in the step definitions underneath, never in the scenario.

Q2: What do Given, When, and Then each represent?

Given is the context already true before the action — the starting state. When is the single action under test — the trigger. Then is the expected, observable outcome. A scenario should have one When; two Whens means two behaviours.

Q3: When should you use a Scenario Outline, and what may an Examples row not do?

Use a Scenario Outline when the same behaviour needs checking against several sets of data — one outline, many rows. An Examples row may not change the behaviour: if a row needs a different Then because it is really a different rule, give it its own scenario instead of bending the outline.

Q4: What may a Background contain, and what must it never contain?

A Background may contain only Given steps — shared context that genuinely applies to every scenario in the feature, stated once. It must never contain a When or a Then, because those are actions and outcomes that belong in the individual scenarios under test. Keep it short, or it hides context from the reader.

Q5: Why is declarative Gherkin called “living documentation” when a Word spec is not?

Because the scenarios are executed as automated tests, so they must match the real behaviour or the build fails — the documentation cannot silently drift. A Word spec decays from the day it is written because nothing forces it to stay in step with the code. This only holds if the Gherkin is declarative and business-readable; an imperative click-script documents nothing useful.

12 Interview Prep

Real questions asked in NZ QA interviews for BDD and automation roles. Read the model answers, then practise your own version.

“A teammate writes Gherkin full of ‘click the button with id btnSubmit’. What is wrong with that, and how would you fix it?”

It is imperative and coupled to the UI. Two problems follow: it breaks the moment a developer renames a button or redesigns the screen, even though no business rule changed; and it hides the actual rule under a list of clicks, so no business reader can use it. I would rewrite the steps declaratively — “When the customer submits the payment” instead of the click sequence — and push the ‘how’ (which button, which field) down into the step definitions where it can change without touching the scenario. The scenario should read like a sentence about the behaviour, not a script for operating a screen.

“When would you reach for a Scenario Outline versus a Background?”

They solve different problems. A Scenario Outline is for the same behaviour tested against many sets of data — I write the steps once with placeholders and supply the rows in an Examples table, like a rates rebate across several income and dependant values. A Background is for shared context — Given steps that are true before every scenario in a feature, so I state them once instead of repeating them. The rule I hold to: an Examples row may only change the data, not the behaviour; and a Background holds only Givens, never a When or Then. If either starts doing more than that, I split it out.

“People say BDD gives you ‘living documentation’. What does that actually mean?”

It means documentation that cannot quietly go out of date, because it is executed as tests. A declarative feature file describes the system’s behaviour in business language — “a $150 payment reduces a $1,200 rates bill to $1,050” — and that same scenario runs as an automated test. If the behaviour changes and the scenario is not updated, the test fails and someone has to reconcile them, so the docs and the system stay chained together. A Word spec in a shared drive has no such force and decays from day one. The catch is that it only works if the Gherkin is declarative and readable; an imperative click-script is automated testing but documents nothing a business owner can use.

← Three Amigos & SBE Back to BDD & Three Amigos →