What do we talk about when we talk about evidence?

I’ve got a teaching technique to sell. You buying?

Jun 22, 2026

Before we begin, a quick quiz. Which of these counts as educational evidence?

1. A randomised controlled trial carried out by a large ‘what works’ organisation with 2,000 students across 32 schools trialling a new mindset intervention.

2. Your colleague telling you about a session on ways of using formative assessment they saw at a recent conference given by a well-known education writer.

3. Interviews with 15 teachers carried out by a master’s student in the school they teach at about workload pressures.

4. A neuroscience study conducted on 18 undergrads.

5. A technique I thought of myself and used for the first time before break today.

We’ll come back to this later, but to release a little of the suspense: no, I’m not about to tell you a hunch is worth as much as an RCT.

In some cases, it might be worth more.

If I’m presenting education policy, I don’t want to be driven by my own school experience. If I’m about to teach, researching RCT mindset interventions might not be the best use of my time.

How do we tell one situation from the other? And if we’re listening to someone telling us what to do in the classroom, what standards should we hold them to?

Not all educational evidence is of the same quality and relevance, but if the intended end user is a teacher, they should be in a position to make an informed judgement and weigh up these factors.

The role of an ‘education influencer’

I write on the internet giving suggestions about what teachers might do in their classrooms. So do a lot of other people, and their ideas end up being used by hundreds, sometimes thousands, of teachers. Most of what I use in lessons started life this way, and I’m indebted to the people I borrowed it from.

But there’s a danger, which I discussed in my last couple of posts. An interesting idea, built on a single piece of research, can harden into ‘evidence-based practice’, and then into an essential feature of ‘good teaching’ that has to be visible in every lesson.

Bloggers aren’t the only ones translating research. Schools have teaching-and-learning leads. Academics sometimes write for teachers. Ofsted publishes reviews of what it takes as ‘evidence’. What standards should these ‘education influencers’ be held to? What counts as evidence? And is it ever justified to mandate one of these ideas across a school, a trust, or a country?

If we want an evidence-based profession, these are questions we have to answer.

To give a rough idea of how it might work in practice, I’m going to sell you a technique – and I’ll show my working as I go. Let’s see if you’re interested in buying it by the time I’m done.

The 6Ps

Three-panel comic strip of a grinning, shifty salesman in a checked suit, drawn like a vintage advert. Panel one: he points at the reader — "Psst. You. Yeah, you. After a new teaching technique? Keep it on the down-low, but I've got one going cheap. Tried it myself three times. Went great twice." Panel two: he gestures at a cardboard box stamped "Mystery Method V1.0 — Top Secret," next to a rubber stamp reading "Evidence?" and a label saying "No evidence in here" — "Third time? I've already told you it worked twice. Do you want it or not?" Panel three: he leans in beside a door marked "SLT meeting in progress," pitching a "Special Offer" — "No, wait, don't go. I can do you a whole-school discount if you can get SLT to mandate it across every classroom. Just don't start asking questions — especially about where the 'evidence' comes from."

First, begin with the problem, not the technique. Too many ideas are used like the man with a hammer – they’re tried everywhere, regardless of whether they address any real need.

My problem is usually anything I think might kill my students’ curiosity. Top of the ‘wanted’ list are lessons that feel repetitive to students – nothing embodies this more than revision. My doctoral research – and teaching experience – tell me that when they think they’ve seen a topic before, students switch off.

Here’s what I’ve been trying. It’s a way of using recall questions to bookend a revision lesson, and of enabling you and your students to judge their progress. I’m calling it the 6Ps, because I haven’t thought of anything catchier:

Hand-drawn diagram titled "The 6Ps: a curiosity-fuelled revision lesson," showing six numbered steps left to right, joined by arrows. 1, Pose — ask a set of recall questions. 2, Predict — students predict whether their answers are correct (tick, cross or question mark) and score themselves. 3, Pause — don't give the answers yet. 4, Present — teach the content as normal. 5, Pose again — ask the same questions again. 6, Proof-check — mark together, check scores and reflect. A banner along the bottom reads: "Goal: curiosity, accurate self-assessment, and visible progress."

This is based on some influential neuroscience: people remember more when they’re curious; they get more curious about an outcome once they’ve predicted it; and the memory boost is larger when the prediction turns out to be wrong.

But every one of those is a small-N fMRI or lab study, most of them on undergraduates answering trivia questions. It’s never faced the ultimate test: 30 teenagers reluctantly revising physics in an English classroom on a sweltering June afternoon.

Putting the 6Ps to the test

My Year 10 have their mock exams next week. Today, we’re revising mains electricity. I come up with a set of 10 questions on the topic. Or, rather, I tell the AI of my choice:

Come up with 10 questions on unit P5 (mains electricity) from AQA Trilogy combined science GCSE physics, so I can paste them straight onto a slide. Put the answers underneath.

I check them, obviously – but I know the spec, I’ve taught it for years, and that takes a minute or two instead of the half hour writing them from scratch would.

Now for the 6Ps:

1. I start the lesson with the 10 recall questions (pose).

2. Students answer them then predict how they’ve got on. Tick, cross or question mark next to each. Tick = 1 mark, cross = 0, ? = ½ mark. They give themselves a score out of ten (predict).

3. I hold back the answers (pause).

4. I do my normal revision lesson – go over key ideas, worksheets, the usual. Some of the answers will come up, but I don’t deliberately hand them over (present).

5. Same questions, same scoring format (pose again).

6. This time, they actually mark them (proof-check).

Then on mini-whiteboards they show me their ‘before’ and ‘after’ scores – both sets of questions, marked against the real answers. But I tell them to check their own scoring to see how accurate they were.

The point of all this is:

a) To show them they don’t know everything before we begin so they see the value of the lesson.

b) To improve how accurately they can assess their own understanding of a topic, and boost the efficiency of their revision.

c) For me to find out how much their knowledge changed from start to end and see how well I’ve pitched the lesson.

And… did it work?

The first two times I tried this, it went well. Students gained two or three marks on average, from a baseline of four or five out of ten. If I stopped there, I’d be writing this as a recommendation.

I tried it a third time and it was messier. The lesson overran, so I had to push the final pose-predict-proof-check into the next lesson; the gain came out at about a mark above where they’d started.

Maybe the effect is short-lived. Maybe those questions were harder. Maybe it was the time of day (or temperature of the room).

I asked the students what they made of the technique, on whiteboards, and the verdict was lukewarm: some wanted longer questions, some wanted more explanation, some couldn’t see how they were meant to answer questions on things they hadn’t revised yet.

This is solid science transformed into a method you could use in the classroom tomorrow, so what’s the problem?

As ever, I blame it on the children.

But at least there’s an upside. While it might have failed in the classroom, I can always try flogging it to an ed-tech company as a fallback.

There’s a bigger issue here, though. This is evidence for me, as the classroom teacher. I know the class and can judge the effect based on my sense of what they already know. But that doesn’t license me to start selling the technique on the basis that ‘it works’.

I had no control group. There’s little evidence that it would translate to another class in the school, let alone another school, subject or age group.

‘Try it,’ I can say.

‘Do it’ – less so.

Three rules for selling something

As the blogger trying to talk you into it, what am I honestly obliged to tell you? And if you’re the teaching-and-learning lead who’s just found it, what should you be asking before you put it in front of fifty colleagues on an INSET morning?

Here are some rules of thumb I think anyone trying to influence educational practice should bear in mind, applied to my own method:

Solid evidential basis. The links between curiosity, prediction and memory are well demonstrated and have been replicated. I’ve read the papers and their limitations; I’m not a neuroscientist, but other neuroscientists have cited and reproduced this work, which is about as much as I can honestly claim. So, I can say the science seems solid – but I can’t say it works in the classroom.

The size of the leap. The bigger the change you’re asking for, the stronger the evidence and the more transparent the claim has to be. If I told you the best way to learn about mains electricity was through pure discovery learning – to hand your class a set of screwdrivers and then let them loose with the sockets – I’d need to be very sure indeed. If I told you that break times should be fully structured – scaffolded use of the climbing frame, teacher-mediated discussion with your mates – same again. The 6Ps asks for almost nothing, so the cost of trying it and finding it wanting is almost nothing. That’s the whole reason I can suggest you try it while admitting I can’t prove it works.

Little danger of harm. Providing you already do some retrieval practice, you can probably judge the outcome of doing this before you start. A small leap means it’s easier to judge potential harm, and here there isn’t much room for anything to go wrong. There is a danger of potentially leaving errors uncorrected until the end of the lesson, but we’re talking about material students have met already, with responsive re-teaching at the heart of the lesson and collective marking at the end.

What counts as evidence in education?

You’ve been patient, so here are the five again:

The 2,000-student RCT on a mindset intervention.
Your colleague’s conference session on formative assessment.
The master’s student’s 15 interviews about workload.
The neuroscience study on 18 undergraduates.
The thing I tried before break this morning.

They can all count – but they’re not all equal. In fact, if you’re planning your lesson for after lunch, the RCT might be the least useful item on the list. It can tell you whether a mindset programme tends to work on average across 32 schools. It can’t tell you whether this exact programme will land with your Year 11 class in an hour’s time. What worked – or didn’t – last lesson is probably a more valuable guide. It's not about creating an evidence ladder or hierarchy, but fitting the tool to the problem in front of you.

The questions to ask next INSET day

Every teacher has sat through a September training session announcing a new whole-school policy. Plenty of them are improvements. More than a few would survive the obvious questions: Why this, and why now? What’s the evidence? Has anyone trialled it here, in this school – ideally the person standing at the front explaining it – before every teacher in the school was told to do it?

I’m not against change; we need it. I’m against change that arrives with all the transparency of a street-corner magician.

I’ve written before about the dangers of ‘total instruction’ – the slow removal of teacher judgement in the name of consistency – but I still think some practices are rightly mandated. Where a student has a special educational need or a disability, adjustments have to be made on the best available evidence; we owe them no less.

I’m not going to tell you what that evidence is – that’s not my area. What I can tell you about is how it feels to follow procedures I don’t understand. I used to have to photocopy worksheets onto coloured paper – up to four different colours for some classes, which is a particular challenge when you’re colourblind, like I am. Then, one year, I didn’t have to any longer.

Where did the evidence for coloured paper come from? Was it always flawed, or did more robust evidence come along that countered it? Nobody ever put either case to me. One year there, next gone. What impact did this have on those who apparently needed the coloured paper?

If we’re going to have whole-school practices, the evidence for them has to be weighed, made transparent, and shown to the people asked to carry them out. Children need good, human teachers more than ever, and the job is arguably harder than it’s ever been. If there was ever a moment to trust teachers with the evidence and let them make their own informed decisions, it’s now.

So, give the 6Ps a go, if you like the look of it – and report back.

But if your students tell you they want longer-answer questions, or more explanation first, or that they couldn’t know the answers because they hadn’t revised the topic yet, don’t say I didn’t warn you.

The Curiosity Gap

Discussion about this post

Ready for more?