How do you interpret a p-value? — Lisa Chapman Consulting

p-values are everywhere in scientific papers but   what does a p-value even mean and why
is it so special if it's less than 0.05   In this video we'll explain how p-values
help scientists make conclusions For example in this 2006 paper scientists
were studying weight maintenance in mice   and the impact of the microbiome on weight gain.
The microbiome is the collection of microbes that   live in or on an organism's body. The scientists
had shown that the gut microbiome of obese mice   had a different composition of bacteria than the
microbiome of lean mice but they wanted to know   if the bacteria in the microbiome was the result
of the difference in weight or if the microbiome   caused the difference in weight to test that they
took mice that had no microbiome of their own   and they did a fecal transplant yeah you
heard that right they took feces from lean   or obese mice and they placed the fecal bacteria
inside mice that had no bacteria of their own   Two weeks later both sets of mice had gained
weight! The bacteria helped the mice get more   energy from their food so the question is can
we conclude that the microbiome from obese mice   makes the mice gain more weight.

Yes we make that
conclusion the bar is higher they gained more   weight, what's the alternative explanation? That
they're not actually different. What do you mean   that they're not different? The bar is higher! Of
course they're different! Let me give you another   example: I had this idea once that people's height
depended on their last name why so I thought maybe   people whose last name starts in the second half
of the alphabet – maybe those people are on average   a little bit taller… I know I know I know – but
it's a testable hypothesis so I did an experiment   I asked people on Twitter to tell me their
height and the first letter of their last name   and look according to the data people whose last
name starts with the letter in the second half   of the alphabet are on average taller! But
that's ridiculous! You only had two people   in each category! Like, you just happened to get two
tall people in the N-Z category! Well, how do   you know that though how do you know it's chance?
I mean we know there's a difference in their last   name! What did the data look like when you had more
people? OK OK so when I looked at more data the   difference between the groups got smaller but it
was still a difference! But I mean still like maybe   you just had some tall people in the N-Z
category just by chance like that's possible right?   Exactly! Exactly! That's what p-values help us do:
they help us tell the difference between random   chance and a real difference between the groups
so going back to the mice how do we know whether   this difference is due to random chance or due
to the different fecal transplants? I mean if   you took 100 mice and weighed 10 of them their
average weight would be different than if you   weighed another 10 mice.

But that's just because
of random chance. Now if I told you that these two   groups received different fecal transplants then
you might think that the difference in weights   is due to the fecal transplants. But what if the
fecal transplants have no effect? I mean we know   that if you take two different groups and weigh
them you'll get slightly different averages right?   So how do we decide whether this difference
in weights is due to random chance or due to   different fecal transplants? The p-value helps us
decide between these two. In statistics these two   possibilities have names: the null hypothesis
is that the two groups are actually the same   but you see a difference in your measurement
just because of random chance. The alternative   hypothesis is that the difference in weight is
not due to random chance but rather due to some   real difference in the characteristics of the two
groups, like which fecal transplant the mice got. In statistics we call these two possibilities the
null hypothesis and the alternative hypothesis   but they are different than the scientific
hypothesis! In this case the scientific   hypothesis is that the microbiome of
obese mice contributes to mouse obesity. Just because they're both called a hypothesis
doesn't mean they're the same so don't mix them up!   So to answer the scientific question, we
need to know whether this difference is   real or just due to random chance.

To distinguish
between these two possibilities we use a p-value. A p-value is a number that is
calculated using a statistical test.   There are lots of different statistical tests
that can be used on different types of data   but each statistical test has an output which
is the p-value. The p-value is a probability – that's what the p stands for – it's the probability
that the observed difference could have happened   IF THE NULL HYPOTHESIS WERE TRUE For example
imagine that we have a lot of mice and we weigh   two groups of them. Let's assume the two groups
are not actually different remember that's the   null hypothesis. So when you measure their weights
the averages are not exactly the same just because   of random chance in which mice got measured.

But
that's normal in fact the probability of getting   a tiny difference is actually quite high even if
there is no real difference between the two groups.   That probability is the p-value. It's usually
expressed as a fraction instead of a percentage.   Now if you measured two new groups the
averages would be different and you'd   have a different p-value. Now let's say you
measure two groups and the p-value is small   that means getting that result is pretty unlikely
IF THE NULL HYPOTHESIS WERE TRUE So that's when   you say maybe in this case the null hypothesis is
not true so these data are better described by the   alternative hypothesis: that the groups really are
different. But the p-value can be anything right?   From low to high so how do you know when it's
low enough to reject the null hypothesis? That's a   great question! Scientists tend to reject the null
hypothesis when the p-value gets less than 0.05 To explain why p values less than 0.05 are so
special let's play a card game, have a seat.

Okay…   Here's how it works: I'm going
to flip over one card at a time   every time i flip over a black
card you get a bar of chocolate.   Every time I flip over a red card you owe
me a dollar. Okay… I mean I like chocolate… Alright let's play! Okay… a dollar huh? Alright here's a dollar. Alright, another dollar. Come on come on come on come on Oh come on! Uh uh, no! There's something wrong with your deck.
It's rigged! Your deck is rigged! You're right   it is! When you started the game you assumed
that the deck I was using was a normal deck   that was your null hypothesis.

The first time you
got a red card that didn't tip you off because   if the deck I'm using is the same as a normal
deck there's a 50 probability that you'll get   a red card. That's a pretty high chance so you
weren't suspicious yet. The second time you got   a red card you still weren't suspicious because
if the deck I'm using is the same as a normal   deck the probability of getting two red cards
in a row is 25 – that's a one in four chance which   isn't that unusual. As I kept flipping over cards
the probability of getting that many red cards   in a row if the deck I'm using is the same as a
normal deck kept going down.

You started getting   suspicious but you weren't sure yet that it was a
rigged deck. Finally when I flipped over the fifth   red card that's when you said – there's something
wrong with your deck! – because if the deck I'm using   is the same as a normal deck the probability of
getting five red cards in a row is three percent   and because that's so unlikely I thought I don't
think the deck you're using is a normal deck? Yes   that's right because getting that result if the
null hypothesis is true is so intuitively unlikely   that that's when you rejected the null hypothesis.
So if the p-value is the probability of getting a   result if the null hypothesis is true then if the
p-value goes less than five percent – for p-values   we use fractions so 0.05 – we reject the null
hypothesis and conclude the alternative? Yes that's   right! And let's be really clear: it's not like
there's something magical that happens when you   pass the 0.05 threshold 0.049 is really similar
to 0.051.

Because it's just a probability? Exactly! Remember the mice? The mice that received the
obese fecal transplants gained more weight. But   was the difference in weight just random because
of slight differences in the mice they chose   or was it the result of the fecal transplants? The
scientists used a statistical test called a t-test   to calculate the p-value and in this case the
p-value was less than 0.05 so the scientists   rejected the null hypothesis and concluded that
the difference between the two groups of mice was   real and likely due to the fecal transplants.
I mean isn't that cool – the bacteria inside a   mouse can affect how much weight it gains – which
might be true for humans too.

But don't try a fecal transplant at home besides being able to transmit
diseases poo is a lot harder to work with than p… values! No! No! Such a bad joke….

As found on YouTube