Bilinear Autoencoders
Thomas Dooms , Ward Gauderis , Geraint A. Wiggins , Jose Oramas
Technique
Sparse autoencoders are the standard tool to find interpretable features within model activations in unsupervised fashion. Yet, due to their architecture, analysis is constrained to the precise extracted feature basis. Consequently, it is highly difficult to combine these features into higher-order structures like manifolds.
This paper introduces a sparse autoencoder variant that can extract superposed features, while remaining linearly analyzable. We achieve this by linearly reconstructing the quadratic input space (), effectively decomposing the activations into polynomial factors. These polynomial factors can be analyzed, for instance using SVD, to find further structure within the autoencoder. We use this toward automated extraction of subspaces that are likely to contain interesting manifolds.
Explore the manifolds
The full latent atlas and the 3D manifolds of every composite live in the interactive explorer. Browse the UMAP of all composites, drill into individual latents, and rotate the manifolds for any of them.
Extensions
We also propose three extensions to bilinear autoencoders that impact the structure and properties of the extracted features.
First, we use a scale-invariant measure of sparsity which avoids dead features altogether. This regularization forces features to be both specific yet robust to noise, making it a much nicer metric than .
Second, we introduce an analytic way to impose a complete importance ordering on features. This means that only using a prefix of the autoencoder will also yield good reconstructions.
Last, we discuss how to add a further bottleneck to extracted features, akin to toy models of superposition, which helps understand which features overlap and mix.
Future work
This paper proposes solutions to several open problems, but much work remains to verify and extend the approach.