I have also written a formal reply which has been published as a response on Pediatrics’ site which you can access and read (free) by clicking here.
When articles get published, they go through what’s called the peer-review process. During this process, two or three individuals, hopefully in your field and hopefully aware of the topic at hand (though I can tell you from experience that is not always the case) read the manuscript, make comments for editing, and give a final say as to whether or not it should be published. Recently, an article by Anna Price and colleagues has been covered in the media as it supposedly touts that there are no long-term effects of infant sleep interventions, or crying-it-out (CIO). I got my hand on the article (which is still in early release through Pediatrics) and decided to write my own peer review. I can only imagine that, being a medical journal, there were no psychologists on hand to assess this research, or that our standards for research have fallen dramatically. For the record, I have written reviews for journals (including some of the top journals in psychology) and received many a review so I’m not doing this with no experience. And this is what I would say (though I omit parts to do with grammar which I always get lots of comments on myself… *sigh*).
So here it goes…
The article presents findings from the Kids Sleep Study, a follow-up from the Infant Sleep Study, which took place in 2003 in Australia. The current findings pertain to child and parent outcomes for children who demonstrated sleep problems and who were either placed in an intervention group or control group when the children were 6 years of age. The topic itself is very interesting and much research is needed in this field. The authors rightly present that there is debate around infant sleep interventions with no clear research to date demonstrating long-term harms or benefits. However, the current study suffers from several flaws which make the conclusions drawn by the authors quite premature. Given the data collected and the analyses performed, the conclusions are unfounded.
The introduction does a decent job of covering the literature but fails to mention the very real shortcomings of the preexisting reviews (e.g., Mindell et al., 2006) – namely, that many studies (which comprise the review in question) do not have an appropriate control group. The failure to adequately control for natural changes in infant sleep, maternal depression, and the subsequent changes in parent-child relationships invalidates many of the conclusions reached in previous research. In fact, one of the strengths of the current study and the previous Hiscock and colleague articles (2003; 2005) is the randomized trial nature of the study. I suggest the authors look at the Infant Sleep Information Source run by Dr. Helen Ball and colleagues for more information.
A second concern is the statement that “no studies… have reported detrimental effects”. This is a rather disingenuous statement given the paucity of research looking at the outcomes that are hypothesized to be related to detrimental effects. In fact, in the Mindell et al. review cited only three articles examined secondary outcomes to sleep training and only one outcome is relevant to the concerns regarding sleep training – attachment. However, even attachment status was not measured appropriately, but rather using a non-validated self-report version of a measure that is supposed to be given by a trained professional in all three cases. None of the measures considered measuring objective child-parent attachment, but focused solely on parent-reported attachment. (Of note, the other main sleep review by Owens, France, & Wiggs, 1999, suffers the same problems.)
Third, the statement that “teaching parents to regulate their children’s sleep behaviour is a form of limit setting that… constitutes the optimal, authoritative, parenting style for child outcomes” requires much more backing than articles simply arguing that authoritative parenting styles are optimal (which is not in debate here). How does regulating a child’s sleep pertain to limit setting? How does a parent who does not “regulate” sleep fail in this regard? You fail to define “regulate” and “limit setting” and fail to cite research backing this assertion raising questions about its veracity.
First I must commend the developers of the Sleep Study for actually utilizing a control group and utilizing randomized trials. This is rarely seen in sleep research and, as mentioned previously, raises concerns about the conclusions that have been drawn to date. The inclusion of a comparable control group allows for comparisons when methods and other controls are appropriate. However, I have strong concerns over the methodology utilized herein. Below are specific comments.
- First, the presence or absence of a “sleep problem” in infancy was parent-reported and not, so far as can be told, verified objectively. It is quite possible that parents’ feelings of self-efficacy from doing something (intervention) improved their own perception of their child early and later. The intervention and results do not seem to speak to fixing sleep problems so much as fixing the parental belief about a sleep problem (which can sometimes be fixed when parents are only given information about normative sleep patterns).
- There is no discussion of the training and adherence to study protocol by the intervention nurses. Can you provide how many followed the script during random checks? Especially during the later months of the study when standardization tends to suffer most?
- You mention that the control nurses were free to give advice, but were not trained to give the standardized response. Did you control for what information they gave? Given that infant behavioural interventions are quite common, is it not possible they provided some of the same information as the intervention nurses?
- Of the intervention group, individuals attended an average of 1.52 visits (presumably out of 2 given the data on mean length of visit). What were the reasons for not attending a second visit? Are there qualitative differences between the two groups on any of the variables of interest? What material was covered at each meeting and therefore what material was missed by those who only attended one session?
- The authors have written that “each family chose which (if any) type or mix of strategies they would use” and that only 100 of 174 families selected a strategy and attended the meetings. This is crucial and poses a large problem for the later analyses. First, there is no check that there are differences between the two strategies included in the intervention and yet no prior research to suggest they are equal in possible long-term effects. Second, there is no mention of how those that selected an intervention were different from those families that did not. What was the reason the other families did not select in? Did they have problems with the interventions? Did they believe the child’s sleep problems were not great enough to warrant intervention? It is also unclear if all intervention families received the information about positive bedtime routines or just those who also selected to use a behavioural management strategy.
- The saliva testing is a great addition to the measures; however, it is questionable if providing only one day’s worth of saliva (two measures) is an accurate assessment of chronic stress. A pattern over a few days would have been preferable. However, I realize it would be near impossible to obtain this now, so it should be discussed as a limitation to the current results.
- With the exception of health-related quality of life and stress, all child measures are parent-report. This poses a large problem, particularly when parents selected whether or not to utilize the interventions. As mentioned previously with respect to fixing sleep problems, parents’ perceptions may be coloured by their choice and feelings of having done something rather than the intervention itself. Without objective measures, it is impossible to rule this out, especially when they know the follow-up is to do with their earlier action (or inaction). This is particularly problematic for the child and child-parent measures. For example, the article cannot claim to have measured the parent-child relationship, but rather the parental perception of their relationship with their child. And if the effects of sleep training are presumed to be on the child, the measures are failing to capture this construct.
Overall, the analyses are well handled. Clustering and including confounds are to be recognized as being recommended and solid statistics. I am pleased to see the inclusion of research-based controls including gender, temperament, depression, and SES; however, you mention that there are analyses for which these were not included to avoid instability. Have you considered running a Structural Equation Model with all of your variables included in order to avoid this potential problem?
The largest problem is that this is one of the cases in which utilizing the intention-to-treat principle is unwarranted. Yes, generally this is something we want to consider, especially when groups self-select (e.g., in the case of home birth versus a hospital birth); however, you are looking at and making conclusions about outcomes in which nearly half of the intervention group declined the intervention for which they were randomly assigned. Notably, you fail to report how many of the follow-up individuals were from this group that did not utilizing any intervention (though clearly some of them belong to this group as your final intervention n=122 when only n=100 agreed to take part in one of the behavioural intervention strategties, and perhaps more from the latter group dropped out – we have no way of knowing). When making conclusions about the long-term effects of a particular strategy in a randomized trial, the intention-to-treat principle muddies the results because it is not providing outcomes pertaining to the strategies at hand and may be influenced (positively or negatively) by those who did not choose to take part in the intervention they were randomly assigned to. Please redo your analyses including only the group that did agree to treatment. Additionally, why did you expect intracluster correlations to fade over 5 years? If families remain in the same area, there are many variables expected to remain correlated. Acknowledging that you did the analysis with correlations allowed and results were similar would be warranted.
The results are presented in a cohesive and clear manner. A couple comments:
- How do the underrepresented groups tend to score on these types of outcome measures? This type of information would be useful to frame the current results.
- You mention that there are no real differences between the two groups (rightfully based on statistical analyses, but not based on the methods used); however, looking at the confidence intervals given, some of the results seem due to high variability within the actual measures. Notably both % of sleep problems and % of abnormal cortisol show very large 95% CIs. Given the n’s these seem abnormally large. I am particularly concerned because while the CI includes 1 in each case, the lower end is much closer to 1 than the upper, suggesting a potential trend towards long-term effects that is not being addressed in the current research (both for sleep problems and chronic stress in the intervention group). What were the effects of the control variables? What were the results without them? (I realize that some of the other large CIs are due to control variables even though the means are nearly identical, so I acknowledge these large CIs, but am not concerned by them.)
Overall, this is the weakest section. Your results, given the methods and analyses, yield no such conclusions while the limitations are overlooked in the discussion. The authors did not provide evidence that there were no long-term effects. In fact, given the use of intention-to-treat for a treatment that had a 58% agreement rate and randomized allotment, the authors can make very few conclusions at all. Add to that the use of measures that do not objectively measure the constructs of interest – sleep problems, child well-being, parent-child relationships, etc. – and the results become even less conclusive.
While I appreciate the authors’ attempt to address a question that is much needed, the fact remains that the current methodology and analyses do not address said question. If the authors are willing to completely redo their analyses, they can report on the effects of these interventions on the parental perception of certain variables as well as chronic stress (if this current method is a valid way to assess stress levels).
I would not recommend this manuscript for publication in its current form. In fact, I question if the changes highlighted above would result in findings that have any real meaning given the methodology employed; however, I would be willing to entertain another draft with the appropriate analyses and results in place.
 Price AMH, Wake M, Okoumunne OC, Hiscock H. Five-year follow-up of harms and benefits of behavioral infant sleep intervention: randomized trial. Pediatrics 2012; DOI: 10.1542/peds.2011-3467.