Source: https://youtu.be/n5vzuQAToZE?si=k_QNDoInAuKSahLI
Date: 2026-03-04
---
## Key Takeaways
### Approval Testing vs Acceptance Testing
- **Acceptance testing** is a broad term: testing from the customer's perspective to determine if software meets requirements — everyone has done this
- **Approval testing** is a specific, niche technique — a particular method of verifying output by diffing against a previously approved result
- The two terms are frequently confused but are not synonymous
### The Approval Testing Technique
- Changes the traditional **arrange-act-assert** pattern to **arrange-act-print-diff**
- Instead of asserting an expected value, you **print the result** and diff it against a previously **approved** version
- Produces **three-state output**: no change (confident) / expected change (approve it) / unexpected change (likely a bug)
- Contrast with unit tests which produce **two-state output**: pass or fail — if an assertion fails, something is definitely wrong
### Real-World Applications
- Works with PDFs, HTML, screenshots/pixels, GUI widget trees, and any text-representable output
- Converting HTML to ASCII before diffing reduces noise from markup churn
- Has been used in practice for 25+ years in large, complex systems (life insurance platforms, Android screenshot testing, etc.)
### Benefits
- **Diagnosability**: failing tests show a diff with full context — what stayed the same and what changed — making root cause analysis easier
- Easy to **bulk-update tests** when behavior intentionally changes: approve the new output and all related tests go green instantly
- A single logical change causing many failures is often one approve-all operation, not many individual bugs
### Maintenance Costs and Challenges
- New tests go through a **stabilization phase** — irrelevant output must be scrubbed or filtered over time before tests are reliably stable
- **Social pressure is a risk**: under stress, developers may approve diffs without careful review, silently accepting bugs
- Cascading failures (100 tests failing for one reason) can cause panic until you recognize it's a single root cause
### Tool Support is Critical
- **Use an established approval testing framework** rather than building your own — the tooling to interpret results is the hard part
- Good tools should: group related failures together, warn when approving output containing stack traces, and distinguish multiple distinct failure causes
- Filtering/scrubbing irrelevant output is essential — without it, tests never stabilize
### Terminology Clarification
- **Golden master**: implies the approved result is fixed and sacred — misleading since you update it regularly
- **Snapshot testing**: implies ephemeral, throwaway data you don't need to review — misleading since you should always inspect what you approve
- **Approval testing**: emphasizes the deliberate act of reviewing and explicitly approving output
- Key distinction: approval testing requires explicit human review on first run; some snapshot tools auto-record without showing you anything
### AI and the Future (2026)
- LLMs are well-suited to help **interpret approval test failures** — comparing and analyzing text diffs is something they do well
- AI tooling could assist in grouping failures, flagging obviously wrong approvals, and distinguishing noise from real changes
- Too early for established best practices — experimentation is encouraged and cheap
## Introduction and Defining the Core Concepts
Kind: captions Language: en Hello, welcome to one big question. We are discussing today approval tests versus acceptance tests. What is the difference? This is the modern software engineering channel. Welcome. I'm Emily B and I'm delighted to be discussing this question today with Kent Beck. Hello Kent. >> Hello Emily. How you doing? >> I'm good. It's really good to get to chat with you about this. >> Yeah. >> Okay. Approval test versus acceptance test then. >> Yeah. So you in a recent video you you made a strong distinction here and I want to make sure that I understand what you mean by approval tests and how it's similar to or different from things that I may have done in the past or taught about in the past. Later on we can we can figure out where that fits into the constellation of all testing. But uh right now there seems to be a distinction and I'd love to understand more about it. Yeah. Yeah. So, I in a recent video I I did actually imply that I didn't think you had tried approval testing and I realized afterwards that a lot of people confuse approval testing with acceptance testing and and I don't want to imply that you haven't done acceptance testing cuz that's a really really important technique and I'm sure you've done that. I mean, everyone's done that. That's test from the customer perspective, from the perspective of the person who's going to accept the software or not. Um, so that's a very general term. Approval testing is is a technique that I've been using for for many years, but it's quite niche in comparison. I mean, >> so my my suspicion my at this point in my a aging career is I've probably done something similar, but I use different words. So, I want to make sure that I understand what you're doing. Now, I wanted to set up all of this by saying this is a conversation that's been going on for 25 years, >> something like that. Yeah. So, I remember you showing up in Sardinia at the early XP conference and uh all the Xpers were walking around talking about how amazing their unit tests were and how you could catch almost all bugs with unit tests and and you said, "No, no, no, no. There's this other thing." So, it's not like this is a new distinction or new conversation. >> No, it's it's true. I I remember meeting you back in 2002 and and talking about testing and yeah it's an technique that that I've been using for a long time and and the the name approval testing actually is is later we only started calling it that in about 2012 and uh when we first talked I think I was may have been calling it textbased testing which is a term that never caught on but it's most people who have used this technique might have called it golden master testing or snapshot testing those kind of similar a little bit different though I think and people often use an approval testing framework to do characterization testing >> okay >> it's also closely related technique
## What is Approval Testing
>> so Emily can you give me an example of approval testing >> yeah so if at the simplest level it's the the name comes from the way you approve a result rather than make an assertion so in a normal unit test you would have arrange act assert given when then and in an approval test it changes the assert part you you have um I would normally explain as arrange act print diff. So instead of asserting something about the result, you print it to a string or some other format that you can then diff against a previously approved version of that test result. >> So is screenshot testing an example of this? >> It could be if you are looking at those screenshots and checking them and approving them explicitly. >> Yeah. So, I had a student at at Facebook, Arnold Neuron, who has this Android screenshot tester that will show you the difference between the screens before a change and after a change. And sometimes you'll get changes that aren't errors. Something moved by one pixel and you that's fine. Don't worry about it. Sometimes you'll get no change when you expect no change and that's good. And sometimes you'll get a change that is an error. You kind of have three states to the output and then you have to decide when you see a difference is that did the behavior actually change or not. In contrast to unit testing where if you make an assertion and it fails something's definitely wrong. >> Yes.
## Real-World Examples and Applications
>> Okay. So I have done this. I've done it in multiple forms of output PDFs for example. So I've for those same 25 years I I've worked on a life insurance system that's quite large and has gajillion tests and many of which are based on now print the annual report for this contract. Is it the same as the annual report was before or does does it match the expected values? So I've done it with PDFs. I've done it with HTML, which I isn't a good format because things change so much. I've done it with pixels screenshot kind of thing. I've done it with the first one of these that I did was at masspar ancient history, but I had the whole user interface. Every widget knew how to render itself as a string. And so you get this big trees kind of structure and then you diff that. So, yes, I've done this. The the key distinction for me seems to be that you have this three-state output instead of two-state output. >> Interesting. >> What What else What else is going on that you see? >> Yeah. So, so I just I'm very happy to hear you talking about that this is something you've done. That's not something that I've noticed from your writings about testing at least. So, and I've also done it with PDFs. I've also done it with gooies. And actually with HTML, I've got a a printer that converts HTML to ASI and it's the ASI that I use in the test cuz HTML's got all this, you know. Yeah. So that's a revelation that you've done this. Yeah. And you're right. The the thing with a an assertion is, as you say, it passes or fails and the test passes or fails. But with an approval test, you've got this you've got to work out if you can approve the new result or if you don't care and you should be filtering it. So you need to change your printer to scrub that or otherwise not print it or whether it actually is a bug that you need to go and fix. So yeah, you need more tools in approval testing to help you to interpret the the results, which is one of the reasons I usually say use an approval testing tool that someone else has built. Don't build your own.
## Maintenance Costs and Tool Support
>> Oh, interesting. I I I mean I'm I love building tools, so I would say exactly the opposite because you Here's why I don't use it all the time. And it's because it has this long-term maintenance cost where you can't just run it. If it passes, you go into production. You run it, it fails. Now, it it it's asking for your attention once again. Let me look at it. And you know, and sometimes you'll make one little change and a 100 tests will fail, which is the the first one that I did. That was I I I would come in in the morning and there'd be this stack of printed failure reports on my and the first time I'd just kind of panic like oh I must have broken everything and no it was just some date thing or some you know something stupid that though they weren't actually failures. So I prefer to move validation into whatever we're going to call. Okay. So approval testing has this three-state output and then there's these other tests that have a two-state output which really is red and green. And then there's observability which is a whole separate kind of there there isn't binary results to it but the topic for another day. So that's my my main reluctance to use approval testing and I and I did I hadn't thought to give I mean the something that you you've added to the conversation is giving it this name for diffing PDFs and screenshots and custom generated text and whatever other format. I wanted to pick up on what you just said about you you come in and you find you've got a pile of a 100 failing tests and actually it's just one logical change and that seeing a 100 failing tests is is kind of stressful until you realize it is just one logical change and the fix is just to push the approve button and all of the tests go green. And you you only feel confident to do that if you can ascertain that it's just one logical change and all 100 tests are failing for the same reason. Which is why you need tools that will tell you that that will help you to differentiate those cases between I've got a 100 bugs and 100 failing tests and no actually I've got one bug or >> or not even a bug. >> Yeah. Yeah. Or it's um this is actually oh this is good. The thing changed and I wanted it to change and then just hit the approve button and it's green and it's I love the updatable that the fact you can update these tests so easily when when the behavior changes in the way you wanted it to change. It is much easier than than updating assertions often.
## Benefits: Diagnosability and Diffs
The other thing I really like about approval testing is that when the test fails, you get a diff. So you get the context of all the all the differences. You can see what's still the same and you can see what's different. Um so you kind of I call this diagnosability in the test is is really good usually for approval test if you've done your job well and designed that print step in the test really well. >> Yeah. So in in test iterata terms we'd say it's uh specific from a difference in the output you can guess what the difference in the processing was. >> Yes. So that specific is one of the deciderata and it's the one that I wanted to rename to diagnosible because I don't mind if it's I don't I think the word specific that you chose there is saying I can kind of pinpoint the error. The more important thing that I desire to have in my failing test is that I can diagnose why it's failing. And it might not just be one thing. >> Yeah. Yeah. Yeah. We we'll we'll talk about the test is and and how it needs to be uh refined at a later date. I I'm glad we're we're getting this one sorted out.
## The Social Challenge: Pressure and Approval
So here's a a challenge that I see with approval tests is a social social challenge which is I run them I get the hundred failures. I look at the first one I say oh I know what this is approve all these and there's another error. So I see people and this happens in manual testing too. the same that same dynamic where the more pressure there is and the more stressed you are and the more time is is ticking away the less value you get out of the tests because the more likely you are to just jump to a I think these are all fine and you'd actually change two things or you change one thing and somebody else changed something else and there was something else lurking and so you just said the system's okay and the system's not okay and as the pressure ratchet uh ratchets up that problem gets worse. So that's definitely something I've felt using it like oh yeah fine fine but not fine fine and so uh not to say you shouldn't use it but it is a weakness of this compared to the if the unit test pass everything's fine that that's a myth but how close can we get that's my always my question with that >> the the thing you just said about that it with approval test it is easy to update them all and approve something you didn't mean to approve. That's where you need the tools to tell you actually, are you aware that you just approved something with a stack trace in it that's probably a bug? I mean, it's it's it's building that tool support in so that it makes it if you're trying to approve something that is obviously wrong, the tool will notice that for you. Um, or if you're trying to approve a lot of tests because you assume they're all failing the same way, it will say, "No, actually, they're not. That I'm grouping these tests for you. These all failed this way. These failed in this other way." You need to look at at least one from this group and one from this group. But that thing about the the pressure ratcheting up and you're more like to make mistakes when you're under under pressure. Yeah. So you need tools that help you.
## Balancing Confidence and Approval Tests
>> Yeah. That's why I I want to migrate as much of the confidence generating of the property of tests in down into things that are that give me binary results. Now, I worked on enough big systems to know that red and green is nowhere near the whole spectrum. That there's all kinds of stuff in the middle and there's stuff you can't find out until you get into production. But if I can move some confidence into a a boolean test. So, what's the it's like we have boolean tests that are red or green. We've got the these approval tests that are kind of they have a turnary three valued result.
## Handling Irrelevant Changes and Test Stabilization
>> So So the third state of the approval test is is this thing where it's there's a diff but you don't care, >> right? System's fine, but you're getting but something something irrelevant has changed. >> Yeah. And this again is is where you you need to have tools that help you to to start ignoring those kinds of changes. And what you find over time is that the number of when the test is new, there are lots of things that it specifies that you don't care about and you gradually start ignoring them and over time the test becomes stable and doesn't fail for irrelevant reasons anymore. It's like this there's this kind of um the test design phase takes a little longer maybe before you're happy that this is a test that you're is really solid and is deserves its place in your test suite. I would love to see data on on this stabilizing effect. So like how how many accept alls or what whatever you you call that operation. How many accept alls happen? How many of these jackpot outcomes where you get 100 suspicions? Can we call that that result a suspicion? >> You mean that it's the behavior has actually changed? I mean it's either the >> No, no. Something's changed. We don't know if something actually changed. We're just suspicious. >> Okay, >> that's what the approval tester does, right? It says, "Yeah, nothing changed." No suspicion. Totally confident. Yeah. Or or it's like something sus something has changed. I'm suspicious. That doesn't mean I'm wagging my finger, but it does mean that I'm somebody needs to take a look at this. I can't. >> Yeah. So, red doesn't mean that there's an assertion that's failed. Something is definitely wrong. It it means something has changed. Yeah.
## The Future: AI and Approval Testing
>> So, this being 2026, I have to ask this question. What's the what's the genie going to do to make this work better? >> Oh, I think this I don't know. I mean, this is so very early days because using aentic AI is is so new, but I have this my experiments so far tell me that having this text, I mean, genies love text and diffing text is something and and comparing text is something that they're good at. So interpreting test failures I think is something that genies are going to be able to help with and I I want to come back to this question when I've got a bit more experience though. >> Got it. Yeah. I mean to listeners out there try everything you can think of. Experimentation has never been cheaper. Uh nobody knows what the right answer is. So uh just give it a try and and see what happened. >> Yeah. It's it's too early for there to be experts who can tell you how to do this. Um you know >> Yeah. doesn't mean expert is is both there's a real part of it and then there's a tone part of it and there's plenty of people who still have the tone but nobody has the experience.
## Clarifying Terminology: Golden Master vs Snapshot vs Approval
>> Cool. So has have we had the conversation that we needed to have? >> I think so. There I just wanted to make sure it was clear how approval tests differ from golden master and snapshots. >> Oh yes, please. So golden master is um the term is I think from CDs that the uh the original that you made all the copies from was the golden master. So it's this kind of I don't like that because it implies that the approved text is somehow golden and special and should never change and it's not you you're updating it all the time. And it's similar for the term snapshot testing for the other other reason people say oh this is just a snapshot. I don't actually care. I expect it to change all the time. I'm not even going to really look at it. I'm just going to record the snapshot of what my system does today. And >> in in which case, why are why are you even doing that? >> Well, yeah, it's like you've I mean this this is the default behavior of justest as far as I can make out that the first time you run the test, it just records the snapshot. It doesn't even show it to you. And for approval testing, you you have to look at the thing and approve it when you're designing the test. Uh so I don't like snapshot because it implies that it's here today, gone tomorrow. You don't care what's in it. I do care what's in it. So the the snapshot testing as as uh my friend Arnold uses it, it's snapshot in the sense that it's a picture, it's pixels, not in the sense that no this is ephemeral. So lots of people want elephant, you know, and and we we're working out the words for all of this which comes after the experience. And the strange things is given many decades before even my time, people have been working on testing stuff and we still don't have anything like common vocabulary for this stuff.
## Closing Remarks
>> Thank you so much for having this conversation with me so that we could clear up some of the vocabulary and discover we do actually have experience of the same technique. We just haven't been using the same words. >> Yeah. Thank you so much, Emily. >> This has been great, Kent, and I look forward to having further discussions with you about testing. This has been great. And uh oh yes um welcome to the modern software engineering channel.