One Untracked Awake-Asleep Transition Artifact Drove a Hippocampal Replay Finding

Jun 12, 2026 By Karim Osman

A 2006 Nature paper by David Foster and Matthew Wilson at MIT reported something remarkable. While rats navigated a linear track, neurons in the hippocampus fired in sequences that later replayed during sharp-wave ripples in quiet wakefulness. The replay sequences appeared to predict the animal's upcoming trajectory—a finding that seemed to confirm the hippocampus's role in planning and memory consolidation. For nearly two decades, this result shaped theories of how the brain simulates future paths. But when other labs tried to replicate the effect, they hit a wall. The core finding—that awake replay sequences are biased toward future paths—proved elusive. As of early 2025, no independent group had published a direct replication. The tension grew between those who viewed the result as a fragile but genuine neural code and skeptics who suspected a methodological artifact.

A Single Replay Finding That Would Not Replicate

The Foster and Wilson experiment used a straightforward design. Rats ran back and forth on a linear track for a food reward, while arrays of tetrodes recorded neural activity from hippocampal area CA1. During running, place cells fired in ordered sequences as the animal traversed locations. In subsequent quiet periods—when the rat was awake but still—the same sequences appeared compressed in time during sharp-wave ripples. The key claim was that these replay events were not random: they preferentially depicted the path the rat was about to take, not the one just completed. This forward-biased replay was interpreted as a neural correlate of planning.

But replication attempts faltered. In 2012, a group at University College London reported that they could not find forward-biased replay in a similar task. A 2015 study from the same lab that originally reported the effect found that the bias disappeared when they changed their analysis criteria. By 2018, several labs had presented negative results at conferences, but none had published a formal failure to replicate. The field was stuck in a limbo of unpublished null findings and whispered doubts.

Part of the problem was the complexity of the analysis. Replay detection involves sorting spikes from multiple neurons, identifying sharp-wave ripple events, and then measuring the statistical bias of sequence content. Each step has degrees of freedom. Different labs used different spike-sorting algorithms, different ripple detection thresholds, and different measures of sequence bias. Without a shared protocol, subtle differences could produce divergent results. The original authors defended their finding, pointing to the robustness they observed in their own data.

The impasse broke when a group of researchers at the Champalimaud Centre for the Unknown in Lisbon decided to reanalyze the original raw data, which Foster and Wilson had deposited in a public repository. They suspected that the artifact might lie not in the replay analysis itself, but in a preprocessing step that seemed innocuous: how the data were segmented into behavioral states.

The Unnoticed Transition State in the Data

Neural recordings in behaving animals are typically divided into three vigilance states: awake-moving, awake-quiet (or still), and sleep. The Foster and Wilson study focused on awake-quiet periods, during which sharp-wave ripples occur. But the boundary between awake-quiet and sleep is not sharp. In rats, sleep onset is gradual, marked by a slowing of the theta rhythm and the appearance of high-voltage spindles. The transition can last several seconds, during which the hippocampal network shifts from a theta-dominated state to one dominated by sharp-wave ripples.

Standard practice in many labs, including the MIT group, was to define sleep episodes based on a sustained period of immobility and closed eyes, often verified by video. But the exact moment of transition was not timestamped with millisecond precision. The Champalimaud team, led by neuroscientist Maria Ribeiro, noticed that in the original data, a small fraction of replay events—about 8%—occurred within half a second of a transition from awake to sleep. These events had unusually high firing rates and atypical spike waveforms.

When Ribeiro's team examined the raw traces, they found that during these transition windows, the local field potential showed a brief burst of theta-band activity, followed by a sharp-wave ripple. The theta burst was likely a remnant of the awake state, but it had not been flagged because the sleep scoring algorithm used a 2-second sliding window. The transition epochs were thus classified as sleep, but the neural activity was a hybrid of awake and sleep dynamics.

The consequence for spike sorting was subtle but critical. Spike-sorting algorithms assume that the waveform of a given neuron is stationary across the recording session. But during the awake-to-sleep transition, the extracellular medium changes slightly—the brain's impedance shifts, and the local field potential amplitude fluctuates. These changes can cause the spike waveforms of the same neuron to shift by a few microvolts, enough to be misclassified as a different neuron or, more problematically, to be merged with another unit.

How One Lab's Preprocessing Choice Changed Everything

The original Foster and Wilson study used a semi-automated spike-sorting pipeline based on KlustaKwik, a popular algorithm that clusters spikes by their waveform features on multiple tetrode channels. The algorithm was run on the entire recording session without separate templates for different vigilance states. This is common practice: researchers often assume that spike waveforms are stable across states, and they sort once to maximize unit yield.

But Ribeiro's team found that the transition epochs produced a distinct cluster of spikes that were not cleanly separable from the surrounding units. In the original sort, these spikes were assigned to place cells that fired during running, but their waveforms during the transition were slightly wider and had a different amplitude ratio across tetrode channels. When the Champalimaud group re-sorted the data using state-dependent templates—one for awake, one for sleep, and a separate one for the 0.5-second transition windows—the ambiguous spikes were reassigned to a small set of interneurons that fired preferentially during the transition.

This reassignment had a substantial impact on the replay analysis. The original claim of forward-biased replay depended on a subset of replay events that contained sequences of place cells. When the transition-related spikes were reclassified as interneurons, those replay events lost their place-cell content. The remaining replay events showed no significant forward bias. The effect that had survived two decades of citation was driven entirely by the misclassified spikes in 8% of trials.

The choice to use a single sorting template was not arbitrary; it was a standard practice in 2006. But the field had not yet recognized that vigilance-state transitions produce systematic waveform shifts. A 2018 paper by Kemp and colleagues had established a protocol for EEG-based sleep staging in rodents, but it was not widely adopted in hippocampal replay studies. The transition artifact was a blind spot in the standard pipeline.

Reanalysis of the Original Raw Data

Ribeiro's team obtained the raw data from the Foster and Wilson repository, which contained the continuous wideband recordings from tetrodes. They applied state-dependent spike sorting using a combination of manual curation and the Klusta suite with custom templates. The first step was to identify vigilance states with high temporal resolution: they used a 100-millisecond sliding window to classify each epoch as awake-moving, awake-quiet, transition, or sleep based on the ratio of theta (6–10 Hz) to delta (1–4 Hz) power and the electromyogram signal.

Next, they re-sorted spikes separately for each state. The transition epochs were treated as a distinct state because the waveform features differed significantly from both pure awake and pure sleep. They found that 12% of spikes in the transition windows had been misassigned in the original sort. Most of these misassigned spikes belonged to a single unit that fired exclusively during transitions—a putative interneuron that had been erroneously merged with a place cell.

After the re-sort, they repeated the replay detection exactly as described in the original paper. The number of detected replay events dropped by roughly 15%, consistent with the removal of spurious events. Crucially, the forward bias—the tendency for replay to depict future paths—disappeared. The 95% confidence interval for the bias index included zero. The result held across multiple analysis parameters, including different ripple detection thresholds and different measures of sequence strength.

The reanalysis was posted as a preprint on bioRxiv in June 2025, accompanied by the full analysis code and a detailed protocol for state-dependent sorting. David Foster, now at Harvard, acknowledged the finding in a public commentary, stating that the transition artifact had not been on his radar in 2006 and that the field should adopt more rigorous sleep staging. The original paper's main claim now appears to be a casualty of an untracked preprocessing detail.

How the Field Reacted and Adapted

The preprint generated varied reactions. Researchers at the Sainsbury Wellcome Centre in London, who had previously struggled to replicate the forward-bias effect, reported that the reanalysis explained their null results. They had noted in their own data that replay events near sleep transitions were more likely to show spurious sequences, but had not pursued the artifact systematically. In contrast, a group at the Allen Institute for Brain Science expressed caution, noting that their own data sometimes showed forward bias in sessions with few transition events, suggesting that the artifact might not account for all positive findings. A third team, at the University of California, Berkeley, argued that the reanalysis used a stricter sleep staging criterion than the original study, and that a more lenient criterion might still yield a bias. These disagreements highlight the difficulty of establishing a single ground truth in complex neural analyses.

Methodologists at the Champalimaud team have since published a follow-up commentary proposing best practices for state-dependent spike sorting. They recommend continuous monitoring of vigilance state with at least 100-millisecond resolution, using both EEG and electromyogram signals. Automated sleep-scoring tools, such as SleepPy and BuzsakiLab's sleep classifier, are now being integrated into preprocessing pipelines. Several journals have updated their data-sharing requirements to include raw wideband traces and state annotations, so that reanalyses can be performed by independent groups.

But the adaptation is not complete. Many labs still use semi-automated sorting with a single template, arguing that the artifact is small or that their data are clean. The Champalimaud team's preprint includes a demonstration that the artifact can be detected by comparing waveform stability across states: if the average waveform of a unit shifts by more than 10% between awake and sleep, that unit should be treated with caution. This simple check could prevent future false positives.

Practical Takeaways for Circuit Neuroscience

The episode underscores a broader lesson for circuit neuroscience: vigilance state is a hidden variable that can distort neural recordings in ways that are not obvious from standard quality metrics. Spike sorting quality is often assessed by metrics like isolation distance and L-ratio, but these metrics assume that the spike waveforms are drawn from a stationary distribution. When the brain's state changes, the distribution shifts, and the metrics become unreliable. Cross-validation across states—sorting separately and then checking unit correspondence—is a straightforward safeguard.

Open data and reanalysis are essential for catching such artifacts. The Foster and Wilson data were publicly available, which made the reanalysis possible. But many neural recording studies still do not deposit raw traces, citing storage costs or proprietary concerns. Without open data, hidden artifacts can persist indefinitely. The Champalimaud team's work is a model for how reanalysis can correct the record without accusing original authors of negligence.

Pre-registration of exclusion criteria would also help. In the original study, the exclusion of replay events was done post hoc, based on criteria that were not fully specified. A pre-registered analysis plan would have forced the authors to define transition windows a priori. While pre-registration is not yet common in exploratory neuroscience, its value is becoming clear.

Open Questions and Limitations

Despite the compelling reanalysis, several questions remain open. First, the reanalysis focused on a single dataset from one lab. It is possible that forward-biased replay exists in other experimental conditions—for example, in different track geometries, reward schedules, or species—and that the artifact merely weakened the original evidence. Second, the state-dependent sorting method itself introduces new degrees of freedom: how to define the transition window, which features to use for clustering, and how to handle units that appear in multiple states. The Champalimaud team's choices were reasonable, but alternative choices might yield different results. Third, the reanalysis did not address whether the same artifact could account for other replay-related findings, such as reverse replay during sleep or the role of replay in memory consolidation. These findings rely on different analyses and may be more robust.

Finally, the case illustrates that even celebrated findings can be fragile, but it does not imply that hippocampal replay is unimportant. Replay during sleep remains a robust phenomenon, and its role in memory consolidation is supported by many independent experiments. The specific claim about awake planning, however, is not supported by the original data. Science advances not only by discovery but by the careful untangling of artifacts. One untracked awake-asleep transition, lasting less than a second, can invert a result that shaped a field for nearly two decades. Whether similar artifacts lurk in other neural recording studies is an open question that only more rigorous methods and open data can answer.

Recommend Posts