sparse autoencoder for mechanistic interpretability