Hybrid neural networks for continual learning inspired by corticohippocampal circuits

Table of Contents

Corticohippocampal recurrent loops for episode learning and generalization

Recent research increasingly supports the view that the brain does not represent concepts solely through individual engrams during continuous episodic learning^3,4,17. Instead, the brain processes episodic information at multiple levels of specificity, enabling the formation of both generalized knowledge across related episodes and the retention of specific episodic details¹⁸. The complementary learning systems theory offers an explanation for the distinct yet complementary roles of the cortex and hippocampus in memory processing¹⁹. In this framework, the cortex, particularly the mPFC^18,20 and the enhorital cortex (EC)²¹, is implicated in representing generalized regularities across related experiences—a process referred to as memory integration. This generalized information is subsequently conveyed through the medial temporal lobe (MTL)^2,22 to the hippocampus. Within the hippocampus, the CA1 region is thought to mediate interactions between these cortical areas and hippocampal subregions responsible for specific memory representations, such as the DG and CA3. These neural pathways are believed to facilitate the transfer of generalized information, thereby enhancing the learning of new, related concepts²³.

To streamline the process of memory integration and specific memory learning, we simplify the neural pathways representing generalized episodic information—likely involving the mPFC, MTL, EC, and CA1 regions—into a direct mPFC-CA1 pathway (depicted in pink in Fig. 1b). Concurrently, the circuits associated with specific memory representations within the hippocampus are refined to focus on the DG-CA3 pathways (shown in green in Fig. 1b). This refinement results in a recurrent loop, wherein the mPFC-CA1 pathway facilitates the efficient acquisition of novel, specific memories in the DG-CA3 pathways. In turn, the DG-CA3 circuits transfer these newly embedded memories back to the mPFC-CA1 circuits, thereby promoting the integration of related memories⁴. We anticipate that these simplified neural mechanisms underlying continual memory learning could inspire novel computational strategies to enhance continual learning in artificial systems.

Hybrid neural networks designed to emulate corticohippocampal recurrent loop

Based on the corticohippocampal recurrent loop, we designed a hybrid model to simulate the bidirectional facilitation between the mPFC-CA1 and DG-CA3 circuits.

To emulate the function of memory integration in the mPFC-CA1 circuits, we leveraged the ANN’s proficiency in processing high spatial complexity^6,7,24 and developed an ANN that learns the similarities among different episodes or concepts, generating a modulation signal aimed at facilitating the learning of new episodes or concepts. Specifically, the modulation signals generated by the ANN are constrained to reflect the similarity between coarse-grained input features from different tasks or classes, with the goal of guiding new concept learning.

To simulate the function of novel learning in the DG-CA3 circuits, we utilize SNNs due to their sparse firing rates and consequently lower power consumption^7,24, enabling them to learn new concepts associated with tasks or classes.

During the learning process, ANNs generate modulation signals in response to each visual input. These modulation signals serve as masks, selectively activating neurons in the hidden layers of the SNNs and thereby altering the neural synchrony state across different episodes, as illustrated in Fig. 2a. Therefore, the modulation signals vary significantly with dissimilar inputs, enabling the automatic partitioning of SNNs into distinct sub-networks under the guidance of the ANNs. As a result, the ANNs take on the role of episode inference, assisting the SNNs in selecting episode-related neurons for each task or class. This design enhances resource utilization within the SNNs, reduces interference between different episodes, and thereby improves overall learning efficiency.

Notably, the ANNs within CH-HNN can be trained offline or over longer time scales than SNNs, aligning with the neural mechanisms underlying the slower formation of regularities in the mPFC-CA1 circuits during processes such as sleep or gradual learning^25,26.

Introduce metaplasticity mechanism to CH-HNN

In the corticohippocampal loops, research indicates that the modulation signals from the mPFC-CA1 circuit may lead to an increase in false alarms among episodes with high similarity^10,27,28, potentially due to the highly similar neural synchrony in downstream circuits. To counteract this hypothesis and enhance the performance of our hybrid neural networks, we introduce a metaplasticity mechanism⁸, which allows synapses to exhibit variable learning capabilities. Typically, metaplasticity at each synapse is modulated by chemical neuromodulatory signals, such as dopamine and serotonin⁹, which can manifest as changes in the size of synaptic spines, as illustrated in Fig. 2b. In this study, we propose that the LPC¹¹, particularly the angular gyrus (ANG), and the lateral prefrontal cortex (lPFC)²⁹, which are involved in representing recalled content-specific memories, may play a role in modulating synaptic metaplasticity in the DG-CA3 circuit (Fig. 1a).

To implement the metaplasticity mechanism in SNNs, we adopt an exponential meta-function, as proposed in³⁰, to simulate plasticity dynamics of biological synapses. As synaptic weights increase in magnitude, the meta-function output decreases from 1 to 0, as illustrated in Fig. 2c. Integrating the meta-function into the optimization process during SNN training gradually diminishes each synapse’s learning capacity as knowledge accumulates (details in Methods). This approach has proven effective in alleviating catastrophic forgetting in binary neural networks³¹ and SNNs^9,30.

Thus far, we have outlined the development of the CH-HNN framework. Moving forward, we will assess its performance and adaptability across both task-incremental and class-incremental learning scenarios using a range of datasets.

CH-HNN demonstrates superior performance in task-incremental learning scenarios

In the task-incremental learning scenario, tasks with different classes are learned sequentially, requiring the model to identify each task after learning multiple tasks, as illustrated in Fig. 3a. To evaluate our model’s performance, we conducted task-incremental learning experiments using various datasets, including sMNIST, pMNIST, and sCIFAR-100. We compared our approach against several established methods on both ANNs and SNNs, including elastic weight consolidation (EWC)³², synaptic intelligence (SI)³³, and context-dependent gating (XdG)³⁴. Additionally, we utilized finely-tuned SNN and ANN models as baselines for comparison.

**Fig. 3: CH-HNN improves the performance of task-incremental learning across multiple datasets.**

In the CH-HNN model, the ANN is optimized by ensuring consistency between the similarity of the generated modulation signals and the similarity of corresponding samples in the prior knowledge, rather than relying on direct supervised labels for the output modulation signals. This approach addresses the challenge of constructing labels in training datasets for ANN and enhances the model’s adaptability to different tasks.

For the pMNIST dataset, which consists of 784! (the factorial of 28 × 28) permutations, we randomly selected 40 permutations to serve as tasks that are learned incrementally. The remaining permutations were used as prior knowledge to train the ANN to generate task-related modulation signals. To establish similarities between tasks, we grouped the permutations into clusters, with each cluster comprising four similar permutations, enabling the ANN to learn the relationships among tasks through the training samples. For the sMNIST, and sCIFAR-100 datasets, which lack natural task relationships, we manually specified task similarities, assigning a value of 1 within the same task and 0 between different tasks. This setup allows the ANN to perform episode inference based on the input samples from the test dataset.

To assess the effectiveness of the ANN-generated modulation signals in capturing relationships between various tasks, we computed correlation matrices among these signals, which were generated from visual samples in a test dataset. Using the pMNIST dataset as an example—where 40 tasks are grouped into clusters, with each cluster comprising four similar permutations—the correlation matrix (Fig. 3h) closely mirrors the patterns observed among visual samples of the permutations (Fig. 3b). This alignment suggests that ANNs can effectively generate task-related regularities in response to novel stimuli, thereby enabling dynamic episode inference.

With the architecture remaining unchanged in the continual learning framework, all algorithms were finely tuned. The experimental results indicate that, as the number of tasks increases, the CH-HNN model exhibits a progressively greater performance advantage compared to other methods, as demonstrated in Fig. 3c–e.

At the final incremental stage, the CH-HNN model demonstrates a significant performance advantage over EWC, SI, and the fine-tuned baseline. On both the pMNIST and sCIFAR-100 datasets, CH-HNN substantially outperforms the XdG method. Moreover, CH-HNN maintains consistent performance across tasks, achieving the lowest inter-episode disparity—defined as the difference between the highest and lowest accuracy at the final stage. For example, on the sCIFAR-100 dataset, CH-HNN achieves an inter-episode disparity of 17.32%, markedly lower than XdG’s 48.76%. These results highlight CH-HNN’s superior balance between stability and plasticity, a key metric in continual learning, as illustrated in Fig. 3f, g (with further details in Supplementary Table 3).

Additionally, although the XdG method performs comparably to CH-HNN on the sMNIST dataset, it requires explicit task identification (ID) during both the training and inference phases, which constrains its applicability in real-world scenarios, the task-agnostic CH-HNN method not only achieves strong performance across diverse datasets in task-incremental settings but also eliminates the need for task ID, indicating its potential for real-world implementation.

CH-HNN demonstrates superior performance in class-incremental learning scenarios

To explore more complex applications, we extended our investigation to class-incremental learning using the sMNIST, sCIFAR-100, and sTiny-ImageNet datasets. In these scenarios, the model incrementally learns multiple classes and must ultimately recognize all previously learned classes, as illustrated in Fig. 4a.

**Fig. 4: CH-HNN enhances the performance of class-incremental learning on various datasets.**

To facilitate this process, we employed a masking method that selectively activates output neurons corresponding to the current classes while suppressing those of other classes, ensuring efficient learning and minimizing interference among classes. Unlike task-incremental scenarios, which require constructing relationships among tasks that encompass various classes, the challenge here lies in training the ANN to develop relationships among individual classes that have natural similarities. To address this, we used cosine similarity to compute the similarity between the statistics of feature maps from different categories during ANN training (see details in Methods). This approach enables the ANN to automatically generate modulation signals in response to each visual sample. Take the sTiny-ImageNet dataset as an example, we demonstrate the successful construction of an ANN capable of generating related-episode information across different classes by comparing the correlation matrix between modulation signals (Fig. 4h) with the correlation matrix between visual samples within a class (Fig. 4b).

In addition to the EWC, SI, XdG, and baseline methods employed in task-incremental learning, we further incorporate state-of-the-art methods such as iCaRL³⁵ and FOSTER³⁶ for class-incremental scenarios. These methods, widely regarded as benchmarks in recent years, are better suited for class-incremental learning compared to EWC and SI, enabling a more comprehensive evaluation of CH-HNN.

For the experiments with iCaRL and FOSTER, we follow the parameter settings and utilized ResNet32 as specified in their respective publications. The experiments with EWC and SI are conducted using ANNs, which align more closely with their methodologies. For CH-HNN and XdG, we evaluate various spiking neuron models, including exponential integrate-and-fire (EIF)³⁷, leaky integrate-and-fire (LIF)³⁸, and integrate-and-fire (IF)³⁹ models, applied within SNNs to assess their performance.

With the architecture unchanged in the continual learning framework, all algorithms are optimally tuned. The experimental results show that both EWC and SI perform poorly in class-incremental learning, consistent with previous findings⁴⁰. Our CH-HNN model, regardless of the neuron model used, outperforms all other state-of-the-art task-agnostic methods, including iCaRL and FOSTER, as well as metaplasticity approaches (Fig. 4c, d, e). Interestingly, as the complexity of the neuron models increases, CH-HNN demonstrates progressively better performance, likely attributed to the enhanced non-linearity of the spiking models.

Notably, while XdG with the LIF neuron model performs comparably in the sMNIST dataset and even exceeds the performance of CH-HNN in the sCIFAR-100 dataset, its performance declines in the sTiny-ImageNet dataset as the number of tasks increases. This decline may result from increased neuron overlap across tasks due to XdG’s random neuron allocation strategy. Additionally, at the final stage of incremental learning, the inter-episode disparity of CH-HNN is 44.34% in the sTiny-ImageNet dataset and 21.47% in the sCIFAR-100 dataset, both of which are lower than or comparable to those of other methods (see Supplementary Table 5 for further details), as illustrated in Fig. 4f, g.

Furthermore, CH-HNN dynamically generates episode-related regularities based on visual input during both training and testing phases, enabling task-agnostic learning. In contrast, XdG relies on explicit task identification during both training and inference, highlighting CH-HNN’s superior adaptability and suitability for real-world applications.

Knowledge transfer from prior knowledge to new concept learning

With the hypothesis that the mPFC-CA1 circuits learn regularities that summarize related information from prior knowledge, it is crucial to explore whether the ANNs in our CH-HNN model can effectively transfer related-episode knowledge across different datasets, as illustrated in Fig. 5a. Therefore, we conducted experiments where ANNs were pre-trained on prior knowledge derived from the ImageNet dataset and then assessed their performance on the sCIFAR-100 and sTiny-ImageNet datasets. To ensure the priors were distinct, we followed the methodology of ref. ⁴¹ to exclude classes overlapping with CIFAR-100 and Tiny-ImageNet from ImageNet. These experiments utilized the EIF neuron model, which demonstrated the highest performance in class-incremental scenarios for both datasets within the CH-HNN framework.

**Fig. 5: CH-HNN demonstrates the efficacy of the feedback loop.**

By incorporating an ANN pre-trained on prior knowledge, the CH-HNN model continues to significantly outperform other state-of-the-art methods on both sCIFAR-100 and sTiny-ImageNet, demonstrating its ability to transfer knowledge across datasets. This success stems from the ANN component, which effectively learns to extract regularities from prior experiences. The strong alignment between the correlation matrix of modulation signals and sample representations (see Supplementary Fig. 6c, d) further supports this capability.

The evaluation of the feedback loop within the corticohippocampal circuits

In evaluating the efficacy of episode-related information in task-incremental and class-incremental learning, we have validated the role of episode-related regularities in enhancing the learning of novel concepts, thus supporting the function of the feed-forward loop from mPFC-CA1 to DG-CA3 circuits. To further investigate the functional role of the feedback loop from DG-CA3 to mPFC-CA1 circuits, which is believed to transmit novel embeddings to promote generalization across related episodes⁴, we designed experiments where the ANN incrementally learns the classes in the sCIFAR-100 and sTiny-ImageNet datasets.

In the ANN’s incremental learning process, we employed metaplasticity mechanism to mitigate forgetting of previously learned regularities. This approach enables the ANN to continuously learn new embeddings, enhancing its ability to extract episode-related regularities. As the ANN incrementally learns classes, CH-HNN demonstrates improved efficiency, as illustrated in Fig. 5e, f. The correlation matrix, which assesses the consistency of regularities with the sample representations, also demonstrated improvement after learning all classes, exemplified by the sTiny-ImageNet dataset in Supplementary Fig. 6e and 6f. These results indicate that as the ANN in CH-HNN accumulates prior knowledge, its ability to generalize across episodes improves.

Collectively, these findings validate the efficacy of the feedback loop (CA1-CA3 to mPFC-CA1) in transmitting novel embeddings to promote generalization across related episodes, contributing to a deeper understanding of the corticohippocampal neural mechanisms that support lifelong learning.

Lesion experiments

To dissect the contributions of episode inference from ANN’s modulation signals and metaplasticity mechanisms within our CH-HNN framework, we conducted a series of ablation studies targeting these core mechanisms.

For the pMNIST dataset, both mechanisms play a substantial role in enhancing continual learning. Metaplasticity, in particular, enhances stability by balancing the retention of old knowledge with the integration of new information, resulting in a lower inter-episode disparity (12.53%) compared to episode inference alone (29.77%). Episode inference, meanwhile, enhances overall performance by improving average accuracy, reaching a mean of 70.41% (Fig. 5g).

In class-incremental experiments on sTiny-ImageNet, metaplasticity has a limited effect, while episode inference plays a critical role in enhancing the CH-HNN model’s performance, achieving 70.70%, which is comparable to the full CH-HNN model’s performance of 70.72%. However, when ANN guidance is based on priors from less-relevant datasets—thus decreasing guidance accuracy—metaplasticity becomes particularly beneficial, increasing the average accuracy from 42.89% to 47.23% (see Fig. 5h, i, and Supplementary Table 6).

In summary, both episode inference and metaplasticity are essential to our CH-HNN model: episode inference provides the primary boost to overall performance, while metaplasticity offers crucial support under conditions of inaccurate guidance by balancing the retention of old and new knowledge through the preservation of synaptic weights from prior episodes.

Applicability and robustness of CH-HNN in real-world implementation

Most high-performing continual learning algorithms, including XdG methods and the recently proposed channel-wise lightweight reprogramming methods⁴², rely on a perfect task oracle during the inference phase to accurately identify the task for each test image. This dependence complicates their deployment in dynamic real-world environments. In contrast, our CH-HNN model is designed for task-agnostic learning, enabling straightforward implementation across diverse real-world scenarios.

The applicability of CH-HNN is well-aligned with the growing adoption of hybrid ANN-SNN architectures in neuromorphic hardware^43,44, such as PAICORE⁴⁵ and “Tianjic” chip⁴⁶, which support configurable cores capable of operating as either ANN or SNN components. Considering the precision constraints of most neuromorphic hardware, we reduce the CH-HNN model’s precision from float32 to int8, observing minimal performance loss (Supplementary Fig. 4c). Furthermore, simulation results from a cycle-accurate simulator, validated by refs. ^47,48, show that SNNs offer a significant advantage in reducing power consumption by 60.82% compared to ANNs in new concept learning (Fig. 6e). These findings underscore the suitability of CH-HNN for low-power neuromorphic hardware applications.

**Fig. 6: CH-HNN demonstrates adaptiveness and robustness in real-world applications.**

To validate the robustness of our CH-HNN model in real-world applications, we implemented it in two practical settings. First, we applied CH-HNN to a pMNIST recognition task using a quadruped robot equipped with a real-time camera. The robot uses OpenCV⁴⁹ to crop MNIST images, which are processed by the ANNs within CH-HNN to generate modulation signals for episode inference, guiding the SNNs for accurate recognition. Recognized images trigger actions such as nodding or looking upwards (Fig. 6a, Supplementary Movie 1).

Second, we applied CH-HNN, trained in a class-incremental manner on sCIFAR-100 data, to an object grasping task using YOLO detection⁵⁰. CH-HNN identified objects (e.g., distinguishing “Apple” and “Not Apple”) within the camera’s field of view, enabling precise robotic arm grasping (Fig. 6b, Supplementary Movie 2). In a robustness evaluation involving sCIFAR-100 objects under varied positions and angles, CH-HNN achieved an average accuracy of 82% (±7.25%) over 30 trials, demonstrating its robustness under diverse conditions. The experiment included objects from both early and late learning stages, with CH-HNN outperforming methods like EWC in addressing the stability-plasticity dilemma (Fig. 6c, and Supplementary Fig. 4a). Additionally, CH-HNN shows resilience under Gaussian noise, maintaining acceptable performance despite some degradation (Fig. 6d).

Consequently, our CH-HNN method demonstrates both applicability and robustness in realistic scenarios. Furthermore, with integrated spiking unit structures, CH-HNN offers the added advantage of low power consumption.

link

Facebook baixar gratis