Short Report by David Zaha: AlphaFold SAE: Understanding the Hidden Layers of AlphaFold for Protein Conformation Generation (June-August)

Anett Albrecht/ January 3, 2025/ News, Research

Predicting protein structures is a cornerstone of computational biology, with significant implications for drug discovery and bioengineering. AlphaFold has revolutionized this field, achieving near-experimental accuracy in structure prediction. However, its ability to generate conformational ensembles—critical for understanding protein flexibility—remains limited. This challenge arises because AlphaFold is essentially a black-box model, making it difficult to extract insights about alternative protein conformations.

Sparse autoencoders (SAEs) have shown promise in interpreting and steering outputs of deep learning models, especially in large language models. Applying SAE techniques to AlphaFold could enhance its ability to generate diverse structural ensembles, providing deeper insights into protein dynamics and function.

Over the summer, I explored methods to encourage AlphaFold to generate conformational ensembles. I analyzed its distograms, which represent the probability distribution of distances between residue pairs, to understand how structural variability is encoded. Additionally, I worked with AlphaFlow, a modified AlphaFold variant incorporating probability flow techniques to produce ensemble predictions. I retrained AlphaFlow on datasets containing known conformational ensembles to improve its flexibility.

Further, I investigated AlphaFold’s hidden layers, systematically ablating different states to identify how they contribute to protein structure predictions. This led to training an SAE to extract meaningful latent features. Preliminary results indicated that certain hidden units corresponded to specific amino acids and interaction types, suggesting that targeted interventions could make AlphaFold more interpretable and steerable.

Beyond the research, my time in Leipzig was an incredible cultural experience. I immersed myself in German language, traditions, and cuisine, enjoying everything from local specialties like Sauerbraten to lively evenings at open-air music festivals. My lab colleagues welcomed me warmly, showing me around Leipzig and introducing me to everyday German life. I also took the opportunity to travel across Germany and other parts of Europe, gaining a broader appreciation for the region’s history and cultural diversity.

This fellowship was not only a scientific journey but also a deeply enriching personal experience, blending cutting-edge research with cultural discovery. I am incredibly grateful for the opportunity provided by the Max Kade Foundation.

Figure 1: The Sparse-Autoencoder revealed evidence of conformational knowledge within AlphaFold. When applied to the RFAH protein, a metamorphic protein with two conformations, alpha (center) and beta (right). The left subpanel shows a plot of all residues in RFAH, with the two points showing the result of a specific feature activating for a given residue pair. Each pair is shown in red and blue on the right to panels.