So the NATURE commentary from Joe Bast got Stan Young going on one of his favorite topics.
I told you in the most recent post on the Beisner discussion that I didn’t want to get Statistician genius Stan Young going so late at night, well he was already going on the NATURE article about p values.
So he also got to talk about the strength of randomized and controlled studies and why observational studies are unreliable.
Stan Young commented on Nature Articles
Joe Bast Sent me this summary of Nature articles with his brief comments. I liked. …
‘ “Statistical Errors,” by Regina Nuzzo, in the February 13th issue of NATURE, is a fascinating article about the value of P-values. “P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume.” ‘
There is new research in the area of reliability of p-values. The new research indicates they are much less repeatable/reliable than previously thought. STILL, they are relatively reliable so long as the researcher
a. posts their analysis protocol before examination of the data (in pool, call your shot)
b. there is only one question OR the analysis corrects for asking multiple questions.
c. The protocol explains exactly how the analysis will be adjusted AND does not allow the analysts flexibility in the adjustment process.
Most of the non-reliability of reported p-values comes from how the data is treated before analysis, multiple testing (point b above) and multiple modeling (point c).
Simple rule of thumb: Randomized clinical trials are relatively reliable. Observational studies, as currently conducted, are so unreliable as to be essentially worthless.
See how much I have learned from Stan, who is at the National Inst for Statistical Sciences in the Research Triangle NC.
The problem with P values is not in the methodology, but in human nature. Metrics create reality. If journal referees decide on an arbitrary threshold to reject the null hypothesis, they no longer have to delve deep into the woods to accept or reject the research. Once the threshold becomes public, researchers now have a target to aim for. When unscrupulous researchers know what end number they have to achieve, the rest is just algebra.
Unfortunately, too many lay people have no understanding of what these numbers mean or how we arrive at them (I’ll save the public school criticism for later). All they know is that “95% confidence” sounds really convincing and if the P-Value is below the magic number, we get to use the word “cause”. To the uninitiated the word “cause” means when A happens, B happens as a direct result, not when A happens B happens slightly more often than it would in the absence of A.
It’s important to remember that rules, regulations, and standards don’t exist for honest people. The concept of P-values is sound when considered one tool amongst many used by honest researchers for checks and balances against unintentional bias. However, taken as a gold standard by themselves, they have the potential to lend undue intellectual and even legal credence to spurious, barely correlated results.
It is important to educate the general public that P-value means one thing and one thing only, the probability of getting the results you did (or more extreme results) given that the null hypothesis is true. This may mean that it is extremely unlikely that the null hypothesis is true, but someone wins the lottery every once in a while. How much money are you willing to bet that the numbers 1-5 won’t come up on a 100-sided die? How many times in a row would you be willing to make that bet? For me, repeatability is the most important aspect of the scientific method. Randomization and the double-blind protocol go a long way toward preventing bias, but only replication, and lots of it, can truly elevate a claim to the status of “proven” for my money.