Papers
Parameterized Synthetic Text Generation with SimpleStories
ICLR'25 SynthData workshop (Mar 2025)

A synthetic dataset containing diverse but simple stories.

Compositionality Unlocks Deep Interpretable Models
AAAI'25 CoLoRAI workshop (Feb 2025)

Scaling up weight-based interpretability to multi-layer models.

Bilinear MLPs enable weight-based mechanistic interpretability
ICLR'25 spotlight (Oct 2024)

Using bilinear MLPs to reverse-engineer image and language models from their weights.

Weight-based Decomposition: A Case for Bilinear MLPs
ICML'24 MI workshop (Jun 2024)

Introducing bilinear MLPs as a new approach to weight-based interpretability.

Tokenized SAEs: Disentangling SAE Reconstructions
ICML'24 MI workshop (Jun 2024)

We propose using a per-token bias in SAEs to separate token reconstructions from interesting, semantic features.

The Trifecta: Three techniques for deeper Forward-Forward networks
TMLR (Nov 2023)

Three techniques to significantly improve the Forward-Forward algorithm. We achieve 84% on CIFAR-10.

Posts
Tokenized SAEs
Lesswrong (Aug 2024)

An informal write-up of the Tokenized SAEs paper.

Presentations
Weight-based interpretability
Flanders AI Day (Oct 2024)

A quick primer on weight-based interpretability.

Introduction to Transformers
Lab meeting (Jan 2024)

An informal introduction to the Transformer architecture.

Embodied Language Models
University lecture (Jun 2023)

A presentation on the integration of LLMs into robotics.

Diffusion Models
University lecture (Jan 2023)

A brief overview of diffusion models and the current SOTA.