Writing Good Gherkin
The agreed examples from a Three Amigos session are only useful if they are written down precisely. Gherkin is the plain-language grammar for doing that — readable enough for a business owner, exact enough to automate. This lesson teaches the craft of writing it well.
1 The Hook
A team at a fictional council, Manawa District, automated their rates-payment scenarios in Gherkin. A year later a tester opened the feature file to understand how a part-payment was meant to behave. This is what greeted her:
Given I navigate to "https://rates.manawa.govt.nz/login"
And I type "ratepayer01" into the field with id "txtUser"
And I type "Passw0rd!" into the field with id "txtPass"
And I click the button with id "btnLogin"
And I click the link with text "Pay rates"
And I type "150" into the field with id "txtAmount"
And I click "btnSubmit"
Then I see "Thank you"
She still had no idea what the rule was. Was $150 a part-payment of a larger bill? What was the bill? What should happen to the balance? The scenario described clicking a website, not the behaviour of part-payments. It was tied so tightly to the screen that when the login page was redesigned six months later, every scenario in the file broke at once — not because the rates logic changed, but because a button id did. The feature file documented the UI of a moment, and nothing about the rules.
Good Gherkin is the opposite of this. It describes what the system does in the language of the business, not how a user clicks through a screen. Written well, the same scenario survives a UI redesign untouched, reads like a sentence a ratepayer would understand, and tells the next tester exactly what the rule is. This lesson is about writing that kind of Gherkin — and recognising the kind above so you never ship it.
2 The Rule
Gherkin describes behaviour, not clicks. A scenario states a context (Given), an action (When), and an outcome (Then) in the language of the business — never the language of the user interface. If a scenario mentions buttons, fields, or URLs, it is documenting the screen, and it will break when the screen changes while telling you nothing about the rule.
3 The Analogy
A recipe versus a video of someone’s hands.
A good recipe says “cream the butter and sugar until pale.” It tells you the intent, so you can do it with any bowl, any whisk, in any kitchen. A bad recipe would be a silent video of one particular person’s hands in one particular kitchen — “move the yellow handle 14 times, then reach to the left drawer.” The moment you stand in a different kitchen, the instructions are useless, even though the cake is identical.
Imperative, UI-coupled Gherkin is the video of the hands — click this exact button, in this exact field. Declarative Gherkin is the recipe — “the ratepayer makes a part-payment of $150.” The recipe survives a new kitchen the way good Gherkin survives a redesigned screen. Write the recipe, not the video.
4 The Anatomy of a Scenario
Gherkin has a small, fixed grammar. A Feature groups related scenarios. Each Scenario is one concrete example, built from steps that start with keywords:
- Given — the context that is already true before the action. The starting state of the world. (“Given a ratepayer with an outstanding rates bill of $1,200”.)
- When — the single action under test. The event that triggers the behaviour. (“When the ratepayer pays $150”.)
- Then — the expected outcome. What must be observably true afterwards. (“Then the outstanding balance is $1,050”.)
- And / But — extra Given, When, or Then steps, so the scenario reads naturally instead of repeating a keyword.
Here is the Manawa part-payment scenario from the Hook, rewritten as good Gherkin:
Scenario: A part-payment reduces the outstanding balance
Given a ratepayer with an outstanding rates bill of $1,200
When the ratepayer pays $150
Then the outstanding balance is $1,050
And the payment is recorded as a part-payment
Read it aloud — it is a sentence a ratepayer would understand, and it states the rule (a $150 payment reduces a $1,200 bill to $1,050) with no mention of a button, a field, or a URL. One scenario should test one behaviour: one Given context, one When action, one or a few Then outcomes. If you find yourself writing a second When in the middle, you almost certainly have two scenarios.
5 Declarative vs Imperative
This is the single most important distinction in writing Gherkin, and the one the Hook turned on.
Imperative steps describe how a user operates the interface — the clicks, the fields, the keystrokes. Declarative steps describe what the user is trying to do, in business language, leaving the “how” to the automation underneath.
When I type "0800123456" into the field "txtPhone"
And I select "Auckland" from the dropdown "ddlRegion"
And I click the button "btnContinue"
Declarative (prefer):
When the applicant submits their contact details for Auckland
The declarative version is shorter, reads like business, survives a UI change (the dropdown can become radio buttons and the scenario is untouched), and tells the reader the intent rather than the mechanics. The imperative version breaks the moment a field id changes, buries the rule under clicks, and is unreadable to anyone who is not looking at the screen. The “how” — which field, which button — belongs in the automation step definitions, hidden beneath the Gherkin, never in the scenario itself.
6 Backgrounds
When every scenario in a feature shares the same setup, repeating it in each one is noise. A Background is a block of Given steps that runs before every scenario in the feature, so the common context is stated once.
Background:
Given a member with an active KiwiSaver account
And the member is currently contributing at 3%
Scenario: Member raises their rate to a valid value
When the member changes their rate to 8%
Then the new rate of 8% applies from the next pay run
Scenario: Member enters an invalid rate
When the member changes their rate to 5%
Then the change is rejected with the list of valid rates
The shared context — an active account contributing at 3% — is declared once in the Background and applies to both scenarios. Keep Backgrounds short and made only of Given steps that genuinely apply to every scenario. The moment a Background grows long, or you find yourself adding a Given that only some scenarios need, it has become a liability — it hides important context from the reader, who now has to scroll up to understand any single scenario.
7 Scenario Outlines
When the same behaviour needs checking against several sets of values, writing one scenario per value is repetitive. A Scenario Outline states the behaviour once with placeholders, and an Examples table supplies the rows. It is Specification by Example from Lesson 1, written as runnable Gherkin.
Here is the Auckland Council rates-rebate rule from Lesson 1, including the boundary case the Three Amigos had to decide:
Given a ratepayer with an income of <income> and <dependants> dependants
When their rebate is assessed
Then the rebate is <rebate>
Examples:
| income | dependants | rebate |
| 32000 | 0 | $290 |
| 32000 | 2 | $410 |
| 45000 | 0 | $0 |
| 39000 | 0 | $290 |
The outline runs once per row, substituting each <placeholder>. Four data cases, one readable scenario. This is where boundary and negative rows from a Three Amigos session land naturally — the row at exactly the threshold, the row that produces $0. A scenario outline is the right tool when one behaviour varies by data; it is the wrong tool when you are tempted to cram several different behaviours into one table, which makes it unreadable.
8 Gherkin as Living Documentation
The hidden payoff of good Gherkin is documentation that cannot go stale. Because the scenarios are executed as automated tests, they must match the system’s real behaviour or the build fails. That makes the feature files a description of the system that is always true — living documentation.
Contrast that with a Word specification in a shared drive. It is accurate the day it is written and decays from there, because nothing forces it to stay in step with the code. Six months on, no one trusts it. A Gherkin feature file cannot drift the same way: if the behaviour changes and the scenario is not updated, the automated test breaks and someone has to reconcile them. The documentation and the system are chained together.
This is only true if the Gherkin is declarative and business-readable. The imperative click-script from the Hook documents nothing a business owner can use. Declarative scenarios — “a $150 payment reduces a $1,200 bill to $1,050” — can be read by a BA, a product owner, or an auditor, and they double as the test suite. One artefact serves as the requirement, the test, and the documentation at once. That is the whole promise of BDD, and it stands or falls on how you write the Gherkin.
9 Common Mistakes
🚫 Writing imperative, UI-coupled steps
Why it happens: Recording or transcribing clicks feels concrete and is easy to automate first time.
The fix: Steps full of fields, buttons, and URLs break the moment the screen changes and hide the rule. Write declarative steps in business language — “the ratepayer makes a part-payment of $150” — and keep the click mechanics down in the step definitions, out of the scenario.
🚫 Putting more than one behaviour in a scenario
Why it happens: It feels efficient to test login, then payment, then a receipt all in one scenario.
The fix: One scenario, one behaviour, ideally one When. If you have two Whens, you have two behaviours — split them, so a failure points at exactly one rule and the scenario stays readable.
🚫 Overloading the Background
Why it happens: Anything shared by two scenarios gets pushed up into the Background to avoid repetition.
The fix: A long Background, or one holding context only some scenarios need, hides important state from the reader. Keep Backgrounds to a few Given steps that truly apply to every scenario, and never put a When or Then there.
🚫 Writing Gherkin alone, after the code is built
Why it happens: The team treats Gherkin as a test-automation format rather than the record of a Three Amigos conversation.
The fix: Gherkin written solo after the fact just describes what was built — it can never catch a requirement gap, because the requirements are already set in code. The scenarios should come from the agreed examples in the Three Amigos session, before the build.
10 Now You Try
Three graded exercises: spot the anti-patterns in bad Gherkin, rewrite it declaratively, then build a scenario outline with edge and negative cases. Write your answer, run it for AI feedback, then compare to the model answer.
The scenario below, for a fictional ANZ online transfer, is riddled with Gherkin anti-patterns. Identify at least 4 distinct problems and name each one (for example: imperative/UI-coupled, multiple behaviours, missing When, vague Then, UI selectors).
Given I go to "https://anz.co.nz/login"
And I type "user99" in "#username" and "Pass123" in "#pwd"
And I click "#loginBtn"
And I click the "Transfer" tab
And I type "500" in "#amount" and click "#submit"
Then it works
And I log out
List at least 4 anti-patterns and name each:
Show model answer
There are at least six problems; any four correctly named earn full marks.
1. Imperative / UI-coupled steps — the scenario describes typing into "#username", "#pwd", "#amount" and clicking "#loginBtn", "#submit". These break when the UI changes and hide the rule. They belong in step definitions, not the scenario.
2. URLs and selectors in steps — "https://anz.co.nz/login" and CSS ids (#username) couple the scenario to one exact screen.
3. Login noise / multiple behaviours — logging in is not the behaviour under test (a transfer is). The login should be a declarative Given ("Given a logged-in customer") or a Background, not five click steps. Logging out at the end is a second unrelated behaviour.
4. Vague Then — "Then it works" asserts nothing checkable. A good Then states the observable outcome: the balance decreased by $500, the payee was credited, a confirmation reference was issued.
5. No clear single When — the action is buried in "type 500 and click submit"; the real When ("the customer transfers $500 to a saved payee") is never stated cleanly.
6. Unclear context — there is no Given establishing the starting balance or the destination account, so the outcome cannot be verified.
Strong answers name the imperative/UI-coupling and the vague "it works" Then as the two most damaging. The fix is Exercise 2.
Rewrite the broken ANZ transfer scenario from Exercise 1 as good declarative Gherkin. Use a clear Given/When/Then, business language with no UI selectors, exactly one behaviour (the transfer), and a checkable Then. Add a second scenario for a negative case — an attempted transfer that exceeds the available balance.
Show model answer
Feature: Transfer between accounts Scenario: A successful transfer reduces the source balance Given a logged-in customer with $2,000 in their everyday account And a saved payee "Power Bill" When the customer transfers $500 to "Power Bill" Then the everyday account balance is $1,500 And the payee is credited with $500 And a confirmation reference is issued Scenario: A transfer that exceeds the available balance is declined Given a logged-in customer with $300 in their everyday account When the customer attempts to transfer $500 to "Power Bill" Then the transfer is declined with an insufficient-funds message And the everyday account balance is unchanged at $300 What makes this strong: no URLs or selectors (the login is a single declarative Given, not five clicks); exactly one behaviour per scenario with one clear When; a checkable Then that states the balance and the outcome rather than "it works"; and a real negative case that asserts the balance is unchanged — a common gap. The "how" of clicking is left to the step definitions underneath, where it belongs.
Write a Scenario Outline with an Examples table for a fictional KiwiSaver contribution-rate change, where valid rates are 3%, 4%, 6%, 8%, and 10%. Use placeholders for the entered rate and the expected result. Include at least 5 example rows, and make sure at least two are negative or boundary cases (for example an invalid rate, or a value just outside the allowed set). Use a Background for the shared member context.
Show model answer
Feature: KiwiSaver contribution rate change Background: Given a member with an active KiwiSaver account And the member is currently contributing at 3% Scenario Outline: Changing the contribution rate When the member changes their rate toThen Examples: | rate | result | | 4% | the new rate of 4% applies from the next pay run | | 8% | the new rate of 8% applies from the next pay run | | 10% | the new rate of 10% applies from the next pay run | | 5% | the change is rejected with the list of valid rates | | 0% | the change is rejected with the list of valid rates | | -2% | the change is rejected as an invalid value | What makes this strong: the Background holds the shared context (active account, currently at 3%) once; the outline tests one behaviour (changing the rate) across data; valid rows (4%, 8%, 10%) confirm the happy path; and the negative/boundary rows (5% just outside the set, 0%, a negative value) force the rejection path. A weak answer lists only valid rates — it never exercises the rule that rejects invalid ones, which is exactly where the defect hides. Note each row keeps the same behaviour, with only the data and expected result changing.
11 Self-Check
Click each question to reveal the answer.
Q1: What is the difference between imperative and declarative Gherkin, and which should you write?
Imperative steps describe how a user operates the screen — the clicks, fields, and selectors. Declarative steps describe what the user is doing in business language. Write declarative: it is readable, survives a UI redesign, and states the rule. The “how” belongs in the step definitions underneath, never in the scenario.
Q2: What do Given, When, and Then each represent?
Given is the context already true before the action — the starting state. When is the single action under test — the trigger. Then is the expected, observable outcome. A scenario should have one When; two Whens means two behaviours.
Q3: When should you use a Scenario Outline, and what may an Examples row not do?
Use a Scenario Outline when the same behaviour needs checking against several sets of data — one outline, many rows. An Examples row may not change the behaviour: if a row needs a different Then because it is really a different rule, give it its own scenario instead of bending the outline.
Q4: What may a Background contain, and what must it never contain?
A Background may contain only Given steps — shared context that genuinely applies to every scenario in the feature, stated once. It must never contain a When or a Then, because those are actions and outcomes that belong in the individual scenarios under test. Keep it short, or it hides context from the reader.
Q5: Why is declarative Gherkin called “living documentation” when a Word spec is not?
Because the scenarios are executed as automated tests, so they must match the real behaviour or the build fails — the documentation cannot silently drift. A Word spec decays from the day it is written because nothing forces it to stay in step with the code. This only holds if the Gherkin is declarative and business-readable; an imperative click-script documents nothing useful.
12 Interview Prep
Real questions asked in NZ QA interviews for BDD and automation roles. Read the model answers, then practise your own version.
“A teammate writes Gherkin full of ‘click the button with id btnSubmit’. What is wrong with that, and how would you fix it?”
It is imperative and coupled to the UI. Two problems follow: it breaks the moment a developer renames a button or redesigns the screen, even though no business rule changed; and it hides the actual rule under a list of clicks, so no business reader can use it. I would rewrite the steps declaratively — “When the customer submits the payment” instead of the click sequence — and push the ‘how’ (which button, which field) down into the step definitions where it can change without touching the scenario. The scenario should read like a sentence about the behaviour, not a script for operating a screen.
“When would you reach for a Scenario Outline versus a Background?”
They solve different problems. A Scenario Outline is for the same behaviour tested against many sets of data — I write the steps once with placeholders and supply the rows in an Examples table, like a rates rebate across several income and dependant values. A Background is for shared context — Given steps that are true before every scenario in a feature, so I state them once instead of repeating them. The rule I hold to: an Examples row may only change the data, not the behaviour; and a Background holds only Givens, never a When or Then. If either starts doing more than that, I split it out.
“People say BDD gives you ‘living documentation’. What does that actually mean?”
It means documentation that cannot quietly go out of date, because it is executed as tests. A declarative feature file describes the system’s behaviour in business language — “a $150 payment reduces a $1,200 rates bill to $1,050” — and that same scenario runs as an automated test. If the behaviour changes and the scenario is not updated, the test fails and someone has to reconcile them, so the docs and the system stay chained together. A Word spec in a shared drive has no such force and decays from day one. The catch is that it only works if the Gherkin is declarative and readable; an imperative click-script is automated testing but documents nothing a business owner can use.