Papers

Finding manifolds with bilinear autoencoders

Autoencoder architecture that decomposes activations into polynomial manifolds

NeurIPS'25
workshop spotlight
Parameterized synthetic text generation with SimpleStories

A synthetic dataset containing diverse but simple stories

NeurIPS'25
Compositionality unlocks deep interpretable models

Scaling up weight-based interpretability to multi-layer models

AAAI'25
workshop
Bilinear MLPs enable weight-based mechanistic interpretability

Using bilinear MLPs to reverse-engineer image and language models from their weights

ICLR'25
spotlight
Tokenized SAEs: disentangling SAE reconstructions

Using a per-token bias in SAEs to separate token reconstructions from interesting features

ICML'24
workshop
The trifecta: three techniques for deeper forward-forward networks

Three techniques to significantly improve the forward-forward algorithm achieving 84% on CIFAR-10

TMLR

Presentations