<aside> 📌 이번 차수 계획

일시: 2025년 7~8월 매주 금요일 오전 10시 (추후 변동 가능) 장소: 한양대학교 IT/BT관 506발표자: 윤예진, 서동건, 이정연, 김지수, 신영우, 서기정, 손유리, 이휘영, 김민서, 김승희, 임혜림, 황의지, 소남영 각 1회 발표 방법: 자료는 영어로 작성, 발표는 한국어/영어 자유 발표 주제: 아래 지정 논문 (페이지)

</aside>

Papers to be presented (presentation order must be preserved): https://transformer-circuits.pub

  1. A Mathematical Framework for Transformer Circuits
  2. In-context Learning and Induction Heads
  3. Softmax Linear Units
  4. Toy Models of Superposition
  5. Superposition, Memorization, and Double Descent
  6. Privileged Bases in the Transformer Residual Stream
  7. Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
  8. Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
  9. Circuit Tracing: Revealing Computational Graphs in Language Models
  10. On the Biology of a Large Language Model
  11. Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
  12. HeadMap: Locating and Enhancing Knowledge Circuits in LLMs
  13. Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
  14. Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Schedule