New research from Australia is making international headlines telling parents that they have “proven” that one form of extinction sleep training – graduated extinction or “controlled crying” – is totally, utterly safe for children.  The research, out in advance release from the journal Pediatrics, claims to have found that children who undergo controlled crying do not suffer any cortisol spikes and do not show any behavioural or attachment problems later on.  Unfortunately, this isn’t quite the truth and a thorough look at the study is warranted so let’s get on with this, shall we?

Who took part and what were the groups?

The participants were 43 families of children aged 6 months to 16 months.  Families were recruited if they reported that they believed their children had a “sleep problem”.  Though this limits applicability, it is also relevant as most people will not attempt sleep training if they do not believe their child has a sleep problem.  However, it is also worth noting that there were no objective measures of sleep problems.  We have some socio-economic status data for the families, but no information on any other family factors (will get to this below).

The families were randomly assigned to one of three groups:

  • Graduated extinction (controlled crying) (n=14): The families in this group were taught how to follow a strict schedule of when to respond to a baby’s cry, starting at intervals of 2 to 6 minutes on Night 1 and building up to 25 (minimum) to 35 minutes on Night 7 and thereafter. Parents were allowed to shush babies, pat their backs, but were not to pick up the baby.  Arguably, a child who stops calling after night 2 has a very different experience than one who is still crying on night 7.  Unfortunately, there is no information on how long each child took to stop calling out during graduated extinction.
  • Bedtime fading (n=15): The bedtime fading group was told to slowly extend bedtime to later depending on how long sleep onset latency was the previous night, under the pretence that the child would be more tired and fall asleep faster, but they were also told to respond to night wakings as usual.  So, for example, if a baby took over 30 minutes to fall asleep one night, the next night, bedtime was 15 minutes later.
  • The control group (n=14):  The last group received education about normal sleep.  It is unclear exactly what this information was, but presumably the effect was to educate families about normal night wakings.

Families were aware this was a random assignment and two families switched groups (ethically they had to be allowed to do so): One from graduated extinction to control and one from control to graduated extinction.  It is believed that the families remained true to their group given the nature of the study they agreed to.  Generally this method is sound.  They can’t fully randomly assign people and keep them there, though the moment people asked to switch they likely should have been removed from the study as it suggests a bias of some sort towards one method over another.

Isn’t 43 a pretty small sample?

Yes, 43 is small, especially when split into three groups.  However, small samples are not inherently a problem, but there are issues with small samples here if we want to make any definitive conclusions.  Small samples will only detect large effects.  In the case of long-term effects of extinction methods, most people do not argue that every child will suffer negative effects, but likely some will.  Small samples will not allow for any nuance or small effects to be detected and would not allow for any analysis of interacting variables, such as temperament or health factors.  Thus, the authors only have the power to detect a very large effect between graduated extinction and the control group.  If that effect is not detected, it does not mean that there are no differences, it simply means there aren’t incredibly large differences.  To put it in perspective, this is the type of sample size that is used when looking for language differences – findings that are incredibly robust because each child is expected to behave the same way in learning a language.

Did every family remain in the study?

For the initial test and the final follow-up 12 months later, the retention rate was perfect, at 100%, though they are missing data for other reasons at that 12 month follow-up.  However, for the follow-ups in between – in which cortisol assessments and sleep data were recorded– there was a loss of nearly 50% of participants and no information is given as to why.  This is a problem as it means the data on cortisol and infant sleep is based on the tiniest of samples and does not explain why we are missing data from nearly 50% of participants.

What variables were assessed and were they assessed appropriately?

There were several outcome variables of interest and definitely some questions about some of the validity of assessment:

  • Infant sleep. There were four separate infant sleep assessments: sleep onset latency (i.e., the length of time taken to fall asleep), number of night wakings (that woke parents up), wake time after sleep onset (i.e., how long was the infant awake after falling asleep), and total sleep time.  All four of these measures were assessed subjectively using parent report and two of them – wake time after sleep onset and total sleep time – were assessed objectively using actigraph.  Whereas the parental report provides information about the parental perception of infant sleep, the actigraph provides the reality.
  • Infant cortisol. This was measured for two days during morning and afternoon at pretreatment and then at each of the follow-ups at 1 week, 1 month, and 3 months.  Thus, there is no cortisol data for nighttime during the sleep training process (unless an infant is still undergoing sleep training at 1 week) and there is no nighttime cortisol data to inform if there are changes there.  The authors argue that this data speaks to whether or not there is sustained, elevated cortisol levels indicative of long-term, chronic stress.  It does not, however, speak to stress during sleep training or any associations of stress made during the nighttime period.
  • Maternal stress and maternal mood. Mothers completed validated questionnaires on mood and stress at pretreatment and the 1-week, 1-month, and 3-month follow-ups.
  • Infant behaviour. At the 12-month follow-up, parents reported on infant behaviours including internalizing and externalizing problems as well as “total problems” on a well-validated measure of child behaviour.
  • Infant-caregiver attachment. At the 12-month follow-up, parental attachment was measured using the Strange Situation, the gold-star measure of attachment.  Although I’m sure the methods were followed to a tee, there is the issue that the Strange Situation is validated for children aged 9 months to 18 months.  At the time that this was undertaken during this study, children’s ages ranged from 16 months to 26 months, potentially limiting the validity of the measure as one of attachment.

In terms of what is assessed here as outcome variables, this is pretty darn good, except for a couple things.  First, there’s no baseline for attachment or child behaviour by which we can see any change.  This means we have no idea if sleep training or not sleep training affects anything.  Second, we are missing a ton of relevant data that would better inform on whether there are interactions or nuances to any findings.  This includes, but is not limited to, feeding method, how responsive and sensitive parents were outside of nighttime, child temperament, sleeping arrangements, and daycare use (which is known to influence daytime cortisol levels).  This is a lot and why one must add “preliminary” to the research and findings.

What were the authors hypotheses and how did the data fit them?

Source: Unknown

Source: Unknown

The authors of this paper – and the press picking up on it – seem to have a story to be told.  Namely, children who are exposed to graduated extinction will not suffer any stress as a result, will learn to sleep better, moms are happier, and children will suffer no long-term consequences of this technique.  The authors argue their data conclusively shows this, but even if we ignore the small sample size (where sleep is being assessed by a total of 7 participants in each group), does it?

  • “Children don’t suffer stress”. The data cannot really tell us anything about this.  The cortisol measures do seem to show that as a whole children are not showing sustained, elevated cortisol during the daytime.  There is no individual data to tell us if any child actually showed an increase and the data speaks nothing of any cortisol rise during the sleep training process.  Although we need to be very worried about long-term chronic stress, we also need to worry about acute stress on a developing brain.  This tells us nothing about acute stress and nothing of individual differences.
  • “Children learn to sleep better”. When we look at what parents report, yes, there were improvements in infant sleep across all four categories.  However, when the objective actigraph data was included, there was no improvement in the two measures it provided – total sleep time and wake time after sleep onset.  In fact, according to actigraph data, infants in the graduated extinction group actually were awake longer than any other group at the 3-month follow-up and over this time, their wake time increased.  In the graduated extinction group, the number of minutes the infant spent awake increased across the three month period, going from an average of 105.8 to 121.1 (this is in comparison to the parent report of 57.4 to 13.3).  Bedtime fading increased then decreased going from 88.4 to 108.7 back to 97.2 while the education control didn’t change too much, with a slight decrease from 113.5 to 106 minutes.  Now, this is not actually as much of a smoking gun as we might think because of these low sample sizes.  What we need is the actual data for these groups to be able to see if a larger sample would have led to a significant difference.   Unfortunately I have no individual data or standard deviations by which to run any comparison of these changes based on different sample sizes and thus there is nothing more I can analyze except to say that an increase in total wake time of 15.3 minutes (graduated extinction) versus a decrease of 7.5 minutes (control group) is likely to yield significant results with a larger sample, but may not be practically significant.  It’s more likely that we can say that there isn’t a significant change in infant sleep based on the intervention.  Thus, what we can say is that parental perception of infant sleep improved, but infant sleep itself did not.
  • “Moms are happier”. Well, yes, the mothers in the graduated extinction group did have lower stress and greater mood at the end of the 3-month period by which this was assessed.  The problem is that the changes in this group were the lowest of the three groups.  The greatest change was seen in the bedtime fading group followed by the control group and finally the graduated extinction group.  If we’re concerned about mom’s well-being, we’d be better off looking to other methods.
  • “There are no long-term consequences”. Here we can’t say much of anything because we have no pretest data to compare the behavioural and attachment outcomes to.  However, let’s assume that the families’ random assignment was such that we didn’t have to worry about this (of course, one should always check this when doing random assignment).  Given the small sample size, it’s incredibly difficult to take the non-significant results with anything less than a spoonful of salt.  What we do have are small effect sizes suggesting fewer total problems (parent report) in the graduated extinction group than the control group.  This could suggest better behaviour.  Of course, we also have a small effect suggesting greater risk of insecure attachment in the graduated extinction group relative to the control group (and a slightly smaller effect size suggesting greater risk of insecure attachment relative to the bedtime fading group).  That is, only 54% of children were securely attached in the graduated extinction group compared to 60% in the bedtime fading group and 62% in the control group.  So we could say that if this were extended to a larger sample, we might see better parent reports of behaviour, but we’d also see more children insecurely attached.  Even I’m skeptical of saying this, however, given the problems with the data collection.

As you can see, the data that the authors present doesn’t really support the conclusions they and the media are spouting about graduated extinction.  The data does reinforce the difference between parent report sleep data and objective data, something that is showing up in other studies as well, but does it tell us anything else?

What can be taken from this research if not the authors’ conclusions?

I think the big highlight of this study is the potential effectiveness of the bedtime fading strategy in combating maternal negative emotions and perceived infant sleep problems.  However, as there were many issues with the data at hand, more is clearly needed to make any real conclusions about this method.  Yet, this is a gentle method that in this preliminary sense provided improvement in maternal mood and lowered sleep onset latency times in line with that of graduated extinction, but without any risk of stress and no potential effect on attachment.  I truly hope to see more research on this and if this becomes the go-to of families instead of extinction methods.

Final Thoughts

I actually think this methodology provides great ideas as a starting point for future research.  This was a preliminary study and shouldn’t have gotten the type of press it has, but that doesn’t mean the methods are all bad, just lacking.  I do, however, disagree strongly with the conclusions of the authors and the way in which they have presented their research to the press and public.  I find that highly irresponsible and it does a huge disservice to both science and, more importantly, the babies that are left to cry.  Whether an individual baby suffers long-term or not, they suffer acutely and for this to be ignored in order to promote the idea of self-soothing (which the authors erroneously do) or even for maternal well-being now that other methods seem better in that regard is just plain wrong.  Families look to the press and the researchers to accurately represent research and this has not happened in this case.  For that, shame on them.

However, I also want to mention something else that should be spoken of that is highlighted by this study: the surprising and disturbing bit of information regarding the low rates of secure attachment.  Across the entire sample that provided attachment data, only 58% were securely attached.  This is not okay and if this doesn’t signal that something is going very wrong as a society – regardless of what sleep training we take part in – than I don’t know what will.  Most people assume that secure attachment comes so long as there’s no neglect or abuse.  What this is telling us (or reminding us as there has been other research showing a similar number) is that this is not the case.  Our parenting practices as a whole have shifted towards such disengagement that children who are not being maltreated in any blatantly noticeable way are still insecurely attached.  Given the myriad of negative outcomes associated with insecure attachment, I would hope this serves as another wake-up call to all of us to figure out a better way of raising our children.

With that, I hope for better societal support for families and a return to parenting practices that focus on the connection between parent and child.  Only then do I think we can turn this around and help avoid a society run by individuals who struggle because of their upbringings.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]