New research from Australia is making international headlines telling parents that they have “proven” that one form of extinction sleep training – graduated extinction or “controlled crying” – is totally, utterly safe for children. The research, out in advance release from the journal Pediatrics, claims to have found that children who undergo controlled crying do not suffer any cortisol spikes and do not show any behavioural or attachment problems later on. Unfortunately, this isn’t quite the truth and a thorough look at the study is warranted so let’s get on with this, shall we?
Who took part and what were the groups?
The participants were 43 families of children aged 6 months to 16 months. Families were recruited if they reported that they believed their children had a “sleep problem”. Though this limits applicability, it is also relevant as most people will not attempt sleep training if they do not believe their child has a sleep problem. However, it is also worth noting that there were no objective measures of sleep problems. We have some socio-economic status data for the families, but no information on any other family factors (will get to this below).
The families were randomly assigned to one of three groups:
- Graduated extinction (controlled crying) (n=14): The families in this group were taught how to follow a strict schedule of when to respond to a baby’s cry, starting at intervals of 2 to 6 minutes on Night 1 and building up to 25 (minimum) to 35 minutes on Night 7 and thereafter. Parents were allowed to shush babies, pat their backs, but were not to pick up the baby. Arguably, a child who stops calling after night 2 has a very different experience than one who is still crying on night 7. Unfortunately, there is no information on how long each child took to stop calling out during graduated extinction.
- Bedtime fading (n=15): The bedtime fading group was told to slowly extend bedtime to later depending on how long sleep onset latency was the previous night, under the pretence that the child would be more tired and fall asleep faster, but they were also told to respond to night wakings as usual. So, for example, if a baby took over 30 minutes to fall asleep one night, the next night, bedtime was 15 minutes later.
- The control group (n=14): The last group received education about normal sleep. It is unclear exactly what this information was, but presumably the effect was to educate families about normal night wakings.
Families were aware this was a random assignment and two families switched groups (ethically they had to be allowed to do so): One from graduated extinction to control and one from control to graduated extinction. It is believed that the families remained true to their group given the nature of the study they agreed to. Generally this method is sound. They can’t fully randomly assign people and keep them there, though the moment people asked to switch they likely should have been removed from the study as it suggests a bias of some sort towards one method over another.
Isn’t 43 a pretty small sample?
Yes, 43 is small, especially when split into three groups. However, small samples are not inherently a problem, but there are issues with small samples here if we want to make any definitive conclusions. Small samples will only detect large effects. In the case of long-term effects of extinction methods, most people do not argue that every child will suffer negative effects, but likely some will. Small samples will not allow for any nuance or small effects to be detected and would not allow for any analysis of interacting variables, such as temperament or health factors. Thus, the authors only have the power to detect a very large effect between graduated extinction and the control group. If that effect is not detected, it does not mean that there are no differences, it simply means there aren’t incredibly large differences. To put it in perspective, this is the type of sample size that is used when looking for language differences – findings that are incredibly robust because each child is expected to behave the same way in learning a language.
Did every family remain in the study?
For the initial test and the final follow-up 12 months later, the retention rate was perfect, at 100%, though they are missing data for other reasons at that 12 month follow-up. However, for the follow-ups in between – in which cortisol assessments and sleep data were recorded– there was a loss of nearly 50% of participants and no information is given as to why. This is a problem as it means the data on cortisol and infant sleep is based on the tiniest of samples and does not explain why we are missing data from nearly 50% of participants.
What variables were assessed and were they assessed appropriately?
There were several outcome variables of interest and definitely some questions about some of the validity of assessment:
- Infant sleep. There were four separate infant sleep assessments: sleep onset latency (i.e., the length of time taken to fall asleep), number of night wakings (that woke parents up), wake time after sleep onset (i.e., how long was the infant awake after falling asleep), and total sleep time. All four of these measures were assessed subjectively using parent report and two of them – wake time after sleep onset and total sleep time – were assessed objectively using actigraph. Whereas the parental report provides information about the parental perception of infant sleep, the actigraph provides the reality.
- Infant cortisol. This was measured for two days during morning and afternoon at pretreatment and then at each of the follow-ups at 1 week, 1 month, and 3 months. Thus, there is no cortisol data for nighttime during the sleep training process (unless an infant is still undergoing sleep training at 1 week) and there is no nighttime cortisol data to inform if there are changes there. The authors argue that this data speaks to whether or not there is sustained, elevated cortisol levels indicative of long-term, chronic stress. It does not, however, speak to stress during sleep training or any associations of stress made during the nighttime period.
- Maternal stress and maternal mood. Mothers completed validated questionnaires on mood and stress at pretreatment and the 1-week, 1-month, and 3-month follow-ups.
- Infant behaviour. At the 12-month follow-up, parents reported on infant behaviours including internalizing and externalizing problems as well as “total problems” on a well-validated measure of child behaviour.
- Infant-caregiver attachment. At the 12-month follow-up, parental attachment was measured using the Strange Situation, the gold-star measure of attachment. Although I’m sure the methods were followed to a tee, there is the issue that the Strange Situation is validated for children aged 9 months to 18 months. At the time that this was undertaken during this study, children’s ages ranged from 16 months to 26 months, potentially limiting the validity of the measure as one of attachment.
In terms of what is assessed here as outcome variables, this is pretty darn good, except for a couple things. First, there’s no baseline for attachment or child behaviour by which we can see any change. This means we have no idea if sleep training or not sleep training affects anything. Second, we are missing a ton of relevant data that would better inform on whether there are interactions or nuances to any findings. This includes, but is not limited to, feeding method, how responsive and sensitive parents were outside of nighttime, child temperament, sleeping arrangements, and daycare use (which is known to influence daytime cortisol levels). This is a lot and why one must add “preliminary” to the research and findings.
What were the authors hypotheses and how did the data fit them?