Skip to content

Bilinear Autoencoders

Thomas Dooms , Ward Gauderis , Geraint A. Wiggins , Jose Oramas

Paper Code Models

Technique

Sparse autoencoders are the standard tool to find interpretable features within model activations in unsupervised fashion. Yet, due to their architecture, analysis is constrained to the precise extracted feature basis. Consequently, it is highly difficult to combine these features into higher-order structures like manifolds.

This paper introduces a sparse autoencoder variant that can extract superposed features, while remaining linearly analyzable. We achieve this by linearly reconstructing the quadratic input space (x02,x0x1,...,xn2x_0^2, x_0 x_1, ..., x_n^2), effectively decomposing the activations into polynomial factors. These polynomial factors can be analyzed, for instance using SVD, to find further structure within the autoencoder. We use this toward automated extraction of subspaces that are likely to contain interesting manifolds.

Explore the manifolds

The full latent atlas and the 3D manifolds of every composite live in the interactive explorer. Browse the UMAP of all composites, drill into individual latents, and rotate the manifolds for any of them.

Extensions

We also propose three extensions to bilinear autoencoders that impact the structure and properties of the extracted features.

First, we use a scale-invariant measure of sparsity which avoids dead features altogether. This regularization forces features to be both specific yet robust to noise, making it a much nicer metric than L1L_1.

Second, we introduce an analytic way to impose a complete importance ordering on features. This means that only using a prefix of the autoencoder will also yield good reconstructions.

Last, we discuss how to add a further bottleneck to extracted features, akin to toy models of superposition, which helps understand which features overlap and mix.

Future work

This paper proposes solutions to several open problems, but much work remains to verify and extend the approach.