Papers
Parameterized Synthetic Text Generation with SimpleStories
A synthetic dataset containing diverse but simple stories.
Compositionality Unlocks Deep Interpretable Models
Scaling up weight-based interpretability to multi-layer models.
Bilinear MLPs enable weight-based mechanistic interpretability
Using bilinear MLPs to reverse-engineer image and language models from their weights.
Weight-based Decomposition: A Case for Bilinear MLPs
Introducing bilinear MLPs as a new approach to weight-based interpretability.
Tokenized SAEs: Disentangling SAE Reconstructions
We propose using a per-token bias in SAEs to separate token reconstructions from interesting, semantic features.
The Trifecta: Three techniques for deeper Forward-Forward networks
Three techniques to significantly improve the Forward-Forward algorithm. We achieve 84% on CIFAR-10.
Posts
Tokenized SAEs
An informal write-up of the Tokenized SAEs paper.
Presentations
Weight-based interpretability
A quick primer on weight-based interpretability.
Introduction to Transformers
An informal introduction to the Transformer architecture.
Embodied Language Models
A presentation on the integration of LLMs into robotics.
Diffusion Models
A brief overview of diffusion models and the current SOTA.