Papers
Finding manifolds with bilinear autoencoders
Autoencoder architecture that decomposes activations into polynomial manifolds
Finding manifolds with bilinear autoencoders
Autoencoder architecture that decomposes activations into polynomial manifolds
Parameterized synthetic text generation with SimpleStories
A synthetic dataset containing diverse but simple stories
Parameterized synthetic text generation with SimpleStories
A synthetic dataset containing diverse but simple stories
Compositionality unlocks deep interpretable models
Scaling up weight-based interpretability to multi-layer models
Compositionality unlocks deep interpretable models
Scaling up weight-based interpretability to multi-layer models
Bilinear MLPs enable weight-based mechanistic interpretability
Using bilinear MLPs to reverse-engineer image and language models from their weights
Bilinear MLPs enable weight-based mechanistic interpretability
Using bilinear MLPs to reverse-engineer image and language models from their weights
Tokenized SAEs: disentangling SAE reconstructions
Using a per-token bias in SAEs to separate token reconstructions from interesting features
Tokenized SAEs: disentangling SAE reconstructions
Using a per-token bias in SAEs to separate token reconstructions from interesting features
The trifecta: three techniques for deeper forward-forward networks
Three techniques to significantly improve the forward-forward algorithm achieving 84% on CIFAR-10
The trifecta: three techniques for deeper forward-forward networks
Three techniques to significantly improve the forward-forward algorithm achieving 84% on CIFAR-10
Presentations
A compositional approach to interpretability
A deep-dive session on defining interpretability
A compositional approach to interpretability
A deep-dive session on defining interpretability
Weight-based interpretability
A quick primer on weight-based interpretability
Weight-based interpretability
A quick primer on weight-based interpretability
Introduction to transformers
An informal introduction to the transformer architecture
Introduction to transformers
An informal introduction to the transformer architecture
Embodied language models
A presentation on the integration of LLMs into robotics
Embodied language models
A presentation on the integration of LLMs into robotics
Diffusion models
A brief overview of diffusion models and the current SOTA
Diffusion models
A brief overview of diffusion models and the current SOTA