<aside> 📌 이번 차수 계획

일시: 2025년 7~8월 매주 금요일 오전 10시 (추후 변동 가능) 장소: 한양대학교 IT/BT관 506호 발표자: 윤예진, 서동건, 이정연, 김지수, 신영우, 서기정, 손유리, 이휘영, 김민서, 김승희, 임혜림, 황의지, 방인규, 소남영 각 1회 발표 방법: 자료는 영어로 작성, 발표는 한국어/영어 자유 발표 주제: 아래 지정 논문 (페이지)

</aside>

transformer-circuits.pub

A Mathematical Framework for Transformer Circuits
In-context Learning and Induction Heads
Softmax Linear Units
Toy Models of Superposition
Superposition, Memorization, and Double Descent
Privileged Bases in the Transformer Residual Stream
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Circuit Tracing: Revealing Computational Graphs in Language Models
On the Biology of a Large Language Model
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
HeadMap: Locating and Enhancing Knowledge Circuits in LLMs
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Schedule