Sometimes the term “bad science” gets immediately thrown out when new results make the news and a group of people happen to disagree with the findings. Sometimes they are wholly justified in making this claim, but not always. As someone who aims to disseminate the research, it’s important to make a distinction between what is “bad” research and what is something altogether different. I will be the first to admit that I have likely been bad about this distinction myself and I hope it’s something I will be better at going forward because I truly believe it’s something important.
What Makes Something “Bad Science”?
Before I delve into the explanation, let me say that I base my criteria on my work running quality analyses on studies, both as a graduate student and during my time working for the Canadian Council on Learning where my primary job (along with some statistical work) was to make the decisions on what quality research was. The criteria used are pretty standard so this isn’t anything new, though there are considerations that don’t always make it into reviews that I believe should, based on my training in psychometrics and statistics.
When it comes to the actual research, the three main criteria are:
- Did the researchers assess what they wanted to assess or claimed to assess in terms of variables?
- Did the researchers use appropriate statistics to analyze the data?
- Did the researchers use the right sample for the question of interest?
If the researchers included appropriate variables in their analyses, used the right statistics, and used the right sample, it should not be filed under “bad science”. If any of these three criteria were less than optimal then we run into problems with how to interpret the data and thus the conclusions. The further off the research is, the worse it becomes.
Taking recent research on breastfeeding, for example, many studies fail at criteria 1 and/or 3 – the variables and sample. If researchers only have access to “breastfed” or “not breastfed” then the only question they can answer is if a small amount of breast milk makes a difference in outcomes, yet this is often not what should be analyzed. If we want to know if not breastfeeding results in changes to human development, we need to consider how breastfeeding is determined which should be about biological breastfeeding which is defined by exclusive breastfeeding for a period and then complementary breastfeeding for 2 years or beyond. If we don’t have that group or at least the details of breastfeeding exclusivity and duration, then the answers we get aren’t really addressing the questions of interest.
Another example is sleep training. When a study hit the media claiming no long-term effects of extinction sleep training, those of us who understood statistics immediately saw huge problems with the research as it was conducted. In this case, the problems were with the first and third criteria. First, the outcome variables were far from ideal as they were both parent-report and not in areas that one might expect negative consequences (like sleep behaviours and stress responsivity). Furthermore, temperament, which is known to interact with parenting methods, was not assessed at all. Second, the analyses used an intent-to-treat model which meant that the groups were ill-formed. Half of the intervention group (the supposed sleep training group) refused to engage in controlled crying whereas nothing was measured for the control group and other research suggests approximately half of families attempt controlled crying on their own. The only difference, then, was in the information given to the intervention group on “normal” sleep (which was also not quite accurate information). These problems are what led many, including myself, to call this “bad research”.
Even more comprehensive research techniques, like meta-analyses, are not immune to this problem. When Dr. Carpenter and colleagues conducted a meta-analysis on the risk of bedsharing and concluded that it was risky even in the absence of smoking, many researchers in the field of infant sleep and specifically bedsharing, were rightfully upset and confused. Why? Because of the decision of which data files to include in the meta-analyses. The ones that were chosen had some rather severe flaws as they were collected years ago before we understood more about the factors that create an unsafe sleeping environment. Some didn’t include alcohol consumption and those that did had very poorly defined variables (e.g., did you have anything to drink the day before?) whereas most didn’t include bedding type either, another factor that has been found to be as important as smoking to the risk of suffocation or SIDS. Even the analyses were flawed as some of the methods used – like imputation to make up for missing data – were used when the underlying assumptions weren’t met.
Now, it’s never quite so simple as to say it’s all “good” or all “bad” as there are varying degrees of good and bad and most research falls in between. Take the current focus the role of SES in breastfeeding research. Early research didn’t know this and so can’t be faulted for not including SES as a confounding variable and wouldn’t be considered “bad”, but it does mean that the findings must be taken with a grain of salt. However, new research that fails to include it ought to fall under “bad” because we know that this is an important construct. Yet even research that includes it won’t be perfect because statistical controls can only do so much (though they do more than many give them credit for). Is the research “bad” because it can’t create randomly assigned groups? No. Sometimes there are just limitations in research and we simply have to acknowledge them until we find better ways to run our research.
Bad Outcomes versus Bad Conclusions
Another problem comes when the researchers themselves draw conclusions that are not warranted from otherwise good (or good enough) research. This often happens when researchers attempt to fit their data to some political end or hot topic in the media, but really their findings don’t actually support what they are saying. I saw this regularly in the educational research and sadly it also has occurred in research that covers more prominent parenting topics.
Most recently, research on finding certain chemicals in breast milk led one of the researchers to make outlandish claims about the dangers of breast milk and how babies should be weaned after only 3-4 months. Nothing in the research suggested anything of the sort as the research never actually addressed the issue of long-term health outcomes based on these particular levels. But was the research “bad”? For the question of interest in the paper itself, no, it was actually a decent study. Not perfect as the sample has some biases (like eating a lot of whale meat and the timing of when the data was collected), but for the question of bioabsorption of certain chemicals by looking at breast milk, the research did a decent job of answering that (though questions still remained and holes were there).
Another example is when sleep training was promoted in the media based on research by Dr. Weinraub and colleagues. Here, again, certain researchers decided to speak about sleep training when the research actually didn’t look at sleep training at all. It looked at normal night waking patterns in a group of children and simply provided us with normative data from which we could see that night waking is actually a very biologically normal act, even when children are as old as 3 years of age. The research as it was – looking at normal sleep patterns – was very good, but the use of this research to promote particular parenting techniques was not.
This type of research then becomes difficult to promote because people can see it cited for one reason, read it then take away a totally different message. Unfortunately, the abstract conclusion is often one that isn’t supported by the actual research. I do wish peer review would eliminate this, but one doesn’t always get a peer reviewer who is well aware of the nuances in data collection or who may share a similar bias, thus overlooking the fact that the conclusions don’t match the data. Of course, one of the problems is that all researchers want their research noticed and nuanced data doesn’t get headlines, only strong claims of a political nature does.
The Good, the Bad, and the Preliminary
I’ve covered the issues surrounding bad research and bad reporting or conclusions, but there’s a type of research that is often thrown into the “bad” category when it shouldn’t: Preliminary research. I do believe that our society’s lack of understanding as to how the research process works has led to this particular problem (and the tendency to view more research as “bad” as is necessary overall). People expect huge, conclusive studies, and when a particular piece of research falls short, they often claim it’s “bad research”. Sometimes, as covered above, it is (at least for the question at hand), but sometimes it’s not, it’s simply preliminary research.
If there is no research on a particular question of interest, getting funding for a large study is nearly impossible. It’s also impossible to design such a study because we don’t have any research to base very important elements on, like what variables to measure as potential confounding variables. This is why early studies on breastfeeding didn’t include SES or early studies on bedsharing didn’t include alcohol consumption or smoking. Often, then, preliminary studies are of a much smaller scale and don’t include variables that may turn out to be important.
In the parenting realm, the aforementioned study on the presence of certain chemicals in breast milk qualifies as one that is preliminary in nature. There was previously nothing on this topic and so it is difficult to know how the findings will go, how to extrapolate the findings, and the long-term implications of the findings they did get. As mentioned, one of the researchers was truly remiss in trying to make this a political issue based on personal views on breastfeeding, but the research itself – for what it was – wasn’t “bad”.
A more notable example comes from Dr. Middlemiss and colleagues, who studied physiological responses to extinction sleep training. This study has been crucified by people who are pro-cry-it-out and lauded by those who are against it (something I myself was guilty of when it came out, though I was thorough in my explanation of why I sided with the determination that these findings mattered) when in fact it’s neither the be all and end all nor is it bad. It’s preliminary. Until this study, we had no way of knowing what infant responses were physiologically to the cry-it-out method. None. What the researchers found was fascinating, yet the study was far from perfect because it’s a preliminary study. The group was good, the measurement of key variables was appropriate, and the statistics used were appropriate; however, it used a small sample with large variability in their physiology (though they still obtained significant results which are much harder to obtain with large variability and a small sample size), it was of a short duration, and there were no long-term outcomes to associate with the physiological findings they did get. With respect to the conclusions drawn, the focus by others ended up being on the fact that infants were showing high cortisol levels after the extinction of crying, but the focus of the paper was on the loss of physiological synchrony between mother and child. Notably, I have spoken to Dr. Middlemiss personally and heard a co-author on the piece speak about it at a conference and both of them focus on the synchrony issue with only a brief discussion of the heightened cortisol as the findings were clearer for the former. Their research shouldn’t be maligned because others are focusing on other elements.
What preliminary research gives us is an idea of what we might expect, but an idea that has to be tempered with the knowledge that there is so much more to learn. Some preliminary research can be “bad”, though often it’s more difficult to make it “good” because rarely is there a lot of funding for exploratory ideas and so we must be more forgiving and more skeptical. Looking, for example, at Dr. Middlemiss’ study, one of the first missing variables that jumps out to me is the lack of assessment of infant temperament. This may influence findings and explain some of the variability that was found in physiological responses. Hopefully it is included in the next bit of research to look at this (along with later outcomes, longer durations, and so on).
So… What Kind of Research Is It?
This is really the question, isn’t it? How do we properly assess the research, the conclusions, and determine what should have been feasible for the researchers to do? I mentioned the questions to look at for the actual research and making sure the conclusions by the researchers are congruent with the findings. The other area to focus on is the limitations section of the research for it’s here we can often tell not only what kind of research it is, but how the researchers view their own product.
When researchers are aware of the preliminary nature of their research or the pitfalls in how their data was collected, the limitations section will address this and offer up many areas which will need to be explored or they will explain why some variables weren’t assessed as well as they could or should have been. If they are offering you this, no need to malign the research as “bad” because research is never perfect. When researchers try to ignore some of the issues facing their research, you get to be harsher as the researchers clearly aren’t either aware of their own limitations when they should be or are actively trying to hide them in favour of erroneous conclusions (as I was with the recent breastfeeding and IQ research that didn’t even acknowledge the issue of how breastfeeding was assessed).
The more the public is able to discern the good from bad from preliminary research, the better off we all will be because it will enable more critical examinations of what the media is trying to sell us. It also allows us to judge the science journalists whose job it is to provide us with this information and distinction, yet sadly too many of them lack the science background or understanding to do this. Hopefully this little bit helps people better understand the science and how they can take it to heart when trying to make their own evidence-based decisions in parenting.
Fascinating post, thank you. Too often I find myself labeling studies as “bad research” without taking into account the factors you list here. Just this morning I was reading an article about a study that claims co-sleeping is bad for mom’s sleep, http://www.sleep-journal.com/article/S1389-9457(15)00918-1/abstract?cc=y= and reading your post made me wonder where this study would fall in the “good” or “bad” research categories.
I need to do a response to this study – it’s not “bad” but there are a lot of flaws, most notably in the interpretation of the findings and the variable definitions. It’s a mixed bag though!!!