Voltar

AI Models Store Memories and Reasoning in Distinct Neural Regions

Distinct Neural Zones for Memory and Logic

Recent research reveals that AI language models allocate memorized facts and reasoning capabilities to different neural regions. This separation means that a model’s ability to recall specific pieces of information is housed separately from the mechanisms it uses to perform logical inference.

Understanding the Loss Landscape

The investigators used the concept of a “loss landscape” to visualize how errors change as a model’s internal settings, or weights, are adjusted. In this metaphor, high loss corresponds to many mistakes, while low loss indicates accurate predictions. The landscape’s shape—comprising sharp peaks, deep valleys, and flat plains—reflects how sensitive the model is to small weight changes.During training, models move downhill in this landscape, seeking valleys where loss is minimized. By examining the curvature of the landscape, the researchers could differentiate between memorization and reasoning processes.

Memorization Creates Sharp Spikes

Using a technique called Kronecker‑Factored Approximate Curvature (K‑FAC), the team measured how sharply the loss changes in response to weight adjustments. They found that each memorized fact generates a sharp spike in a unique direction. When many such spikes are averaged together, they produce an overall flat profile, indicating that memorized items are isolated and do not interfere with each other.

Reasoning Produces Smoother Curves

In contrast, reasoning abilities rely on shared neural pathways that affect many inputs. This results in moderate, consistent curvature across the loss landscape—akin to rolling hills that maintain a similar shape regardless of the direction of approach. The smoother profile suggests that reasoning is distributed more broadly throughout the network.

Early Attempts to Remove Specific Data

The study also explored early methods for excising particular content from trained models. While these techniques show promise for eliminating copyrighted, private, or harmful text, the researchers caution that neural networks store information in a distributed manner that is not yet fully understood. Consequently, they cannot guarantee complete removal of sensitive data without affecting the model’s overall performance.

Implications for Future AI Development

Understanding how memory and logic are compartmentalized within AI systems offers a roadmap for developing tools that can manage and protect data. As techniques improve, it may become possible to selectively delete specific information while preserving a model’s transformative capabilities. However, the current findings underscore the complexity of neural representations and the need for further research before reliable, fine‑grained data removal can be achieved.

Usado: News Factory APP - descoberta e automação de notícias - ChatGPT para Empresas

Source: Ars Technica2

Também disponível em: