transformerlens

Star

Here are 24 public repositories matching this topic...

Language: All

Filter by language

All 24 Python 18 Jupyter Notebook 3 HTML 1 JavaScript 1

Sort: Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

yash-srivastava19 / arrakis

Sponsor

Star 31

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

python pypi transformer garcon interpretability explainable-ai mechanistic-interpretability anthropic transformerlens research-tooling

Updated Apr 14, 2026
Jupyter Notebook

FarnoushRJ / RelP

Star 29

[NeurIPS 2025 MechInterp Workshop - Spotlight] Official implementation of the paper "RelP: Faithful and Efficient Circuit Discovery in Language Models via Relevance Patching"

language-model circuit-analysis interpretability explainable-ai interpretable-machine-learning explainability llms mechanistic-interpretability transformerlens neurips-2025

Updated Nov 3, 2025
Python

krnel-ai / krnel-graph

Star 22

Lightweight representation engineering dataflow operations for agent developers.

transformers pytorch dataflow parquet huggingface huggingface-transformers duckdb pylance mechanistic-interpretability lancedb transformerlens representation-engineering pragmatic-interpretability

Updated May 27, 2026
Python

09Catho / axon

Star 18

Real-time 3D visualisation of SAE feature activations inside GPT-2, token by token

python threejs machine-learning deep-learning websocket 3d-visualization sparse-autoencoder fastapi gpt2 mechanistic-interpretability transformerlens llm-interpretability

Updated May 19, 2026
JavaScript

stchakwdev / Pinocchio-Vector-Test

Star 3

Investigating whether language models encode anticipated social consequences in their activations. Uses a 2x2 factorial design crossing truth ×ばつ social valence to show that models are more sensitive to expected approval/disapproval than to truth itself.

language-models ai-safety interpretability deception-detection mechanistic-interpretability transformerlens

Updated Dec 18, 2025
Python

zilaeric / othello-gpt-probing

Star 2

Training and exploration of linear probes into Othello-GPT by Li et al. (2022)

probe othello gpt interpretability explainability transformerlens

Updated Jun 29, 2023
Jupyter Notebook

designer-coderajay / glassbox-mech

Star 2

Open-source EU AI Act Annex IV documentation toolkit. Mechanistic interpretability + circuit discovery for transformers. One function call generates a structured, hash-chained evidence package.

mcp pytorch alignment sparse-autoencoders sae black-box-testing explainability fastapi gpt2 regulatory-compliance mechanistic-interpretability transformerlens eu-ai-act compliance-audit llm-compliance transformer-circuits logit-lens attribution-patching circuit-discovery annex-iv

Updated Jun 15, 2026
Python

ashioyajotham / exploring_saes

Star 1

Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.

sparse-autoencoders interpretability activation-functions neuron-activity wandb transformerlens mech-interp

Updated Nov 21, 2025
Python

lciric / does-quantization-kill-interpretability

Star 1

Does Quantization Kill Interpretability? Scaling study across 5 models (124M-2.8B): RTN destroys induction heads in small models, GPTQ preserves them at all scales.

pythia quantization ai-safety sparse-autoencoder mechanistic-interpretability gptq transformerlens transformer-circuits induction-heads scaling-study

Updated Mar 11, 2026
Python

mduffster / epistemic_status

Star 1

Evaluating how a model 'knowing what it knows' changes from base to instruct

pytorch llm mechanistic-interpretability transformerlens

Updated Jan 21, 2026
Python

mduffster / self-referent-test

Star 1

Testing role-based pathways on small LLMs

research transformers pytorch ai-safety interpretability attention-mechanisms ai-alignment llm mechanistic-interpretability transformerlens

Updated Dec 11, 2025
Python

RithvikReddy0-0 / KAMUI

Star 0

Knowledge Activation Mapping & Understanding Interface (KAMUI) — A Transformer Interpretability Framework Built From Scratch in PyTorch.

nlp deep-learning transformers pytorch artificial-intelligence gpt llm mechanistic-interpretability transformerlens

Updated Jun 8, 2026
Python

DipinDevSaji / mechinterp-probe

Star 0

Mechanistic interpretability toolkit for comparing transformer activations, token shifts, and activation patching behaviour.

pytorch ai-safety gpt-2 streamlit mechanistic-interpretability transformerlens activation-patching llm-interpretability

Updated May 23, 2026
Python

azrabano23 / steering-audit

Star 0

When does activation steering actually work? A reliability audit of steering vectors on GPT-2-small.

pytorch ai-safety interpretability ai-alignment gpt-2 llm mechanistic-interpretability transformerlens representation-engineering activation-steering

Updated Jun 8, 2026
Python

ashioyajotham / greater-than-circuit

Star 0

Reverse engineering the circuit responsible for the "greater than" capability in a language model

attention-mechanism ablation-studies mechanistic-interpretability transformerlens activation-patterns gpt-2-small

Updated May 7, 2026
HTML

alexjackson1 / tx

Star 0

A Flax-based library for examining transformers, based on TransformerLens.

deep-learning transformers flax jax transformerlens

Updated Feb 11, 2024
Python

msmichellesamson / residual-stream-sycophancy

Star 0

Probing where in Pythia's residual stream the decision to be sycophantic is already 'decided', using linear classifiers on per-layer activations against a small labeled sycophancy dataset.

python scikit-learn pytorch matplotlib transformerlens interpretability-experiments

Updated May 4, 2026
Python

anki079 / refusal-in-reasoning-models

Star 0

Mechanistic study of the refusal direction across base, instruction-tuned, and reasoning-distilled Qwen2.5-1.5B variants: extraction, ablation, transplant, and phase-aware analysis.

jailbreak language-models ai-safety llm mechanistic-interpretability transformerlens qwen safety-alignment reasoning-language-models reasoning-models deepseek-r1 refusal-ablation refusal-direction

Updated May 8, 2026
Python

78Spinoza / LLMDeHallucinator

Star 0

Automated detection, visualization and suppression of hallucination-associated neurons in open-source LLMs — LLM mechanistic interpretability research tool

ai-safety pacmap model-editing mechanistic-interpretability transformerlens llm-hallucination llm-alignment h-neurons sparse-probing interpretability-research

Updated Mar 19, 2026

ydvlalit03 / Transformer--From-Scratch

Star 0

Hands-on exploration of GPT-2 and transformer internals for text generation using TransformerLens — attention, mechanistic interpretability and sampling, explained step by step.

python nlp deep-learning transformers interpretability gpt-2 transformerlens

Updated Jun 4, 2026
Python

Improve this page

Add a description, image, and links to the transformerlens topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformerlens topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformerlens

Here are 24 public repositories matching this topic...

yash-srivastava19 / arrakis

FarnoushRJ / RelP

krnel-ai / krnel-graph

09Catho / axon

stchakwdev / Pinocchio-Vector-Test

zilaeric / othello-gpt-probing

designer-coderajay / glassbox-mech

ashioyajotham / exploring_saes

lciric / does-quantization-kill-interpretability

mduffster / epistemic_status

mduffster / self-referent-test

RithvikReddy0-0 / KAMUI

DipinDevSaji / mechinterp-probe

azrabano23 / steering-audit

ashioyajotham / greater-than-circuit

alexjackson1 / tx

msmichellesamson / residual-stream-sycophancy

anki079 / refusal-in-reasoning-models

78Spinoza / LLMDeHallucinator

ydvlalit03 / Transformer--From-Scratch

Improve this page

Add this topic to your repo