Decoding gears, revealing minds, and pushing for safer AI systems.
Tracing how information flows through a model by swapping activations at specific sites.
Studying how individual neurons simultaneously respond to many different, unrelated concepts.
Finding the small subnetworks inside large models that are responsible for specific capabilities.
Architectures, representations, and the boundary between learned and symbolic structure.
Exploring how assigning energy scores to data can lead to richer, more structured representations.
Building internal representations that let models predict how the world changes in response to actions.
Finding mathematical expressions that describe patterns in data, without fixing the formula in advance.
Truly caring about correctness
Turning informal mathematical arguments into fully verified, machine-checkable proofs.
Connecting language models to the Lean proof assistant for interactive, verified reasoning.
Developing smarter algorithms for navigating the vast space of possible proof strategies.