Edward Stevinson

profile_pic_me_square.jpg

Hi, I’m Ed, a research scientist and PhD candidate at the CIRCLE group at Imperial College London, where I work on mechanistic interpretability and adversarial robustness. My research centres on understanding the representation geometry of neural networks and how this shapes adversarial vulnerability.

Find me on Twitter, Google Scholar, Github and LinkedIn. Please reach out by email if you want to talk about any research!

2026

  1. ICML
    Attacking the Representation Manifold: A Mechanistic Study of Adversarial Robustness in Modular Addition
    Edward Stevinson and Lucas Prieto
    In Mechanistic Interpretability Workshop, International Conference on Machine Learning (ICML), 2026
  2. ICML
    Innocuous-Seeming Data, Latent Ideology: Ideological Generalisation in Finetuned LLMs
    Robert Graham, Edward Stevinson, and Yariv Barsheshat
    In Pluralistic Alignment and Trustworthy AI for Good Workshops, International Conference on Machine Learning (ICML), 2026
  3. ICML
    adv-paper.svg
    Adversarial Vulnerability from Interference Between Features in Superposition
    Edward Stevinson, Lucas Prieto, Melih Barsbey, and 1 more author
    In International Conference on Machine Learning (ICML), 2026
  4. ICLR
    iclr-contextbench-poster-thumbnail.png
    ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
    Edward Stevinson, Robert Graham, Leo Richter, and 3 more authors
    In International Conference on Learning Representations (ICLR), 2026
  5. ICLR
    months_with_seasons_ls1000_square_bordered-1.png
    From Data Statistics to Feature Geometry: How Correlations Shape Superposition
    Lucas Prieto, Edward Stevinson, Melih Barsbey, and 2 more authors
    In International Conference on Learning Representations (ICLR), 2026

2025

  1. NeSy
    A Scalable Approach to Probabilistic Neuro-Symbolic Robustness Verification
    Vasileios Manginas, Nikolaos Manginas, Edward Stevinson, and 4 more authors
    In International Conference on Neurosymbolic Learning and Reasoning (NeSy), 2025
  2. CVPR
    Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
    Sonia Joseph, Praneet Suresh, Lorenz Hufe, and 7 more authors
    In Mechanistic Interpretability for Vision Workshop, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2024

  1. ICAIF
    Reducing Return Volatility in Neural Network-Based Asset Allocation
    Edward Stevinson and Alessio Lomuscio
    In Proceedings of the 5th ACM International Conference on AI in Finance (ICAIF), 2024

2022

  1. LSI
    Leveraging Knowledge Graphs to Update Scientific Word Embeddings Using Latent Semantic Imputation
    Jason Hoelscher-Obermaier, Edward Stevinson, Valentin Stauber, and 4 more authors
    2022