logit-lens

Star

Here are 12 public repositories matching this topic...

Language: All

Filter by language

All 12 Python 10 Jupyter Notebook 2

Sort: Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

OpenInterpretability / decision-locator

Star 2

Find the layer where a language model commits a decision — and steer it. Any open-weight HF model. (WANDERING arc paper #6)

transformers ai-safety interpretability steering mechanistic-interpretability llm-agents activation-patching logit-lens

Updated Jun 7, 2026
Python

designer-coderajay / glassbox-mech

Star 2

Open-source EU AI Act Annex IV documentation toolkit. Mechanistic interpretability + circuit discovery for transformers. One function call generates a structured, hash-chained evidence package.

mcp pytorch alignment sparse-autoencoders sae black-box-testing explainability fastapi gpt2 regulatory-compliance mechanistic-interpretability transformerlens eu-ai-act compliance-audit llm-compliance transformer-circuits logit-lens attribution-patching circuit-discovery annex-iv

Updated Jun 15, 2026
Python

skyline-GTRr32 / OKI-TRACE

Star 1

OKI TRACE: Local LLM observability. See step-by-step, layer-by-layer what your AI thinks. Logit Lens & Attention for HuggingFace models.

python open-source ai transformers developer-tools attention-mechanism blackbox huggingface ai-tools mechanistic-interpretability local-llm ai-interpretability llm-observability ai-transparency glass-box-ai llm-debugging logit-lens

Updated May 17, 2026
Python

jakomycat / logit-lens-vs-tuned-lens

Star 1

Decoding the black box of LLMs: A comparative analysis of Logit Lens vs. Tuned Lens to interpret intermediate Transformer layers in GPT-2.

ia mechanistic-interpretability logit-lens tuned-lens

Updated Apr 2, 2026
Jupyter Notebook

fabthebest / champollion-protocol

Star 1

🏛️ Champollion cracked hieroglyphs in 1822. I applied the same logic to LLM internals. 95% accuracy, 0ドル cost, fully reproducible. Contributors welcome.

transformer ai-safety gpt2 mechanistic-interpretability activation-patching linear-probes logit-lens

Updated May 20, 2026
Jupyter Notebook

adeelahmad / mlx-lm-lens

Star 0

Mechanistic interpretability CLI for transformer models on Apple Silicon. Analyze per-layer predictions, monitor activation drift, compare models, discover circuits. MLX-based, no GPU needed.

python nlp machine-learning transformers lora quantization mlx model-analysis interpretability fine-tuning apple-silicon activation-analysis mechanistic-interpretability logit-lens circuit-discovery

Updated Mar 30, 2026
Python

tomaszwi66 / TinyInterp

Star 0

Local Streamlit app for mechanistic interpretability of transformer models.

transformers pytorch neural-networks interpretability sparse-autoencoder streamlit llm mechanistic-interpretability activation-patching logit-lens

Updated May 6, 2026
Python

hematteo / sparse-readout-prism

Star 0

Sparse Readout Prism: a sparse LM-head basis for logit-lens readouts — companion code for the paper. Pretrained dictionaries: hf.co/hematteo/sparse-readout-prism

language-models sparse-autoencoders interpretability mechanistic-interpretability logit-lens unembedding

Updated Jun 13, 2026
Python

aragorn-w / tuned-lens

Star 0

From-scratch PyTorch implementation of the Tuned Lens (Belrose et al., 2023) — learned per-layer affine probes that sharpen intermediate transformer predictions beyond the raw logit lens.

transformers pytorch gpt-2 mechanistic-interpretability logit-lens tuned-lens

Updated Apr 1, 2026
Python

Seqev / latent-scratchpad-search

Star 0

We optimize a compact latent state (frozen weights) to force failed multi-hop chains to output the missing answer D. 5 pre-registered controls show it simply injects D: carries it without the code-fact, leaves intermediates invisible, inert to hop corruption, and doesn’t transfer. No latent composition at 3B (Llama-3.2-3B, Qwen2.5-3B).

transformers llama multi-hop-reasoning prompt-tuning knowledge-injection llm falsification mechanistic-interpretability qwen latent-reasoning soft-prompts logit-lens matched-controls

Updated Jun 4, 2026
Python

gallam-research-dev / pc-transformer-interpretability

Star 0

Empirical evidence for predictive coding tendencies in the GPT-2 family: residual stream convergence, activation patching, MLP transform analysis, zero-ablation, and logit lens across 7 languages.

deep-learning transformers predictive-coding gpt2 mechanistic-interpretability residual-stream logit-lens zero-ablation

Updated May 28, 2026
Python

aragorn-w / logit-lens

Star 0

Logit Lens terminal visualizer (nostalgebraist, 2020) — decodes GPT-2's intermediate layer predictions using the unembedding matrix, built with TransformerLens and Rich.

interpretability gpt-2 llm mechanistic-interpretability transformerlens logit-lens

Updated Mar 31, 2026
Python

Improve this page

Add a description, image, and links to the logit-lens topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the logit-lens topic, visit your repo's landing page and select "manage topics."

Learn more

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logit-lens

Here are 12 public repositories matching this topic...

OpenInterpretability / decision-locator

designer-coderajay / glassbox-mech

skyline-GTRr32 / OKI-TRACE

jakomycat / logit-lens-vs-tuned-lens

fabthebest / champollion-protocol

adeelahmad / mlx-lm-lens

tomaszwi66 / TinyInterp

hematteo / sparse-readout-prism

aragorn-w / tuned-lens

Seqev / latent-scratchpad-search

gallam-research-dev / pc-transformer-interpretability

aragorn-w / logit-lens

Improve this page

Add this topic to your repo