Edward Stevinson

profile_pic_me_square.jpg

Hi, I’m Ed, a research scientist and PhD candidate at the CIRCLE group at Imperial College London, where I work on mechanistic interpretability and adversarial robustness. My research centres on understanding the representation geometry of neural networks and how this shapes adversarial vulnerability.

Find me on Twitter, Google Scholar, Github and LinkedIn. Please reach out by email if you want to talk about any research!

news

Jun 11, 2026 Our paper looking at how mechanistic knowledge can predict vulnerabilties was accepted to the Mechanistic Interpretability workshop at ICML 2026.
May 23, 2026 Our paper was accepted to the Pluralistic Alignment and Trustworthy AI for Good workshops at ICML 2026.
May 08, 2026 Recognised as a Gold Reviewer at ICML 2026 :medal_sports:
Apr 30, 2026 Paper on superposition created adversarial vulnerability accepted at ICML 2026 – read it on arXiv :page_facing_up:
Jan 26, 2026 Our paper on feature geometry was accepted at ICLR 2026.
Jan 26, 2026 Our LASR Labs paper, ContextBench, was accepted at ICLR 2026
Dec 12, 2025 Two spotlight papers at the Mechanistic Interpretability workshop, NeurIPS 2025 :sparkles:
Sep 15, 2025 Outstanding paper award at NeSy 2025 for our paper on probabilistic neuro-symbolic robustness verification :trophy:
Mar 30, 2025 Our paper on the ViT-Prisma toolkit, an open-source mechanistic interpretability library for vision models, was accepted at the MIV workshop at CVPR 2025
Nov 25, 2024 2nd place in the Apart AI safety hackathon for work on detecting adversarial prompt vulnerabilities :hammer:
May 22, 2024 Took the opposition in an EA debate at Imperial entitled Is AI an Existential Risk?

selected publications

  1. ICML
    adv-paper.svg
    Adversarial Vulnerability from Interference Between Features in Superposition
    Edward Stevinson, Lucas Prieto, Melih Barsbey, and 1 more author
    In International Conference on Machine Learning (ICML), 2026
  2. ICLR
    iclr-contextbench-poster-thumbnail.png
    ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
    Edward Stevinson, Robert Graham, Leo Richter, and 3 more authors
    In International Conference on Learning Representations (ICLR), 2026
  3. ICLR
    months_with_seasons_ls1000_square_bordered-1.png
    From Data Statistics to Feature Geometry: How Correlations Shape Superposition
    Lucas Prieto, Edward Stevinson, Melih Barsbey, and 2 more authors
    In International Conference on Learning Representations (ICLR), 2026