Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: FOR-sight-ai/interpreto

v0.5.0 Refacto Attributions and Concepts

22 Jun 23:57
@github-actions github-actions

Choose a tag to compare

Summary

💥 Breaking changes (concepts simplification)

  • Attributions do not support BatchEncodings of multiple inputs anymore.
  • ModelWithSplitPoints(split_points=[3]) is deprecated, not breaking. Use ModelWithSplitPoints(split_point=3) instead.
  • AnythingElse.split_point or AnythingElse(split_point=...) will raise an error. The splitters are the only ones managing the split point.
  • get_activations() output is not a dict but a tuple now.
  • splitters do not have a get_split_activations() method anymore. It is not necessary; activations are the first element of the tuple.
# Old and broken
activations_dict = splitter.get_activations(inputs)
activations = splitter.get_split_activationns
# New
activations, _ = splitter.get_activations(inputs)
  • from interpreto.model_wrapping import ModelWithSplitPoints is not the correct path anymore. One should use from interpreto import ModelWithSplitPoints which was already the recommended way.
  • Similarly, from interpreto.model_wrapping.llm_interface import OpenAILLM is not the correct path anymore. One should use from interpreto.commons.llm_interface import OpenAILLM.
  • encode_activations() and decode_concepts() methods from concept explainers have been renamed to activations_to_concepts() and concepts_to_activations(). But these are mainly internal to the library.
  • Interpretation methods do not have a concept_model_device anymore. But it should not change much, since the concept model is usually already on the device.
# Old and broken
interpretation_method = TopKInputs(concept_explainer, concept_model_device="cuda")
# New
concept_explainer.to("cuda") # only moves the `.concept_model`, not the `.splitter`
interpretation_method = TopKInputs(concept_explainer)

Please refer to the details and examples of the release note or notebooks for the new concepts API.

Details and Examples

Attributions inference wrapping #142

This is mainly an internal update without impact on the API. However, it impacts the behavior slightly:

  • Better device management between perturbation, inference, and aggregation
  • Simplification of the inference architecture, which solves issues #96, # 108, and #127

Simplify the concepts API #156

The ModelWithSplitPoints was versatile but complex. So we simplified it a bit itself and even further in the newsplitters.

  • Limit to a single split point, so only the splitter has and needs this information.
  • Pass get_activations output to a tuple activations, predictions, thus removing the need for get_spit_activations.
  • Rename all model_with_split_points attributes to splitter, it is shorter and aligns with the new splitters.
  • Force nnsight>=0.7,<0.8
  • Introduce the inputs_to_activations, inputs_to_concepts, activations_to_concepts, and concepts_to_activations naming convention. (some are renaming). These functions do not include batching and can be easily used externally.

New splitters #150 & #156

  • Introduce SplitterForClassification, which autodetects the classification head and splits between the model and the head. Thus, we can consider it forces the CLS_TOKEN granularity.
import datasets
from interpreto import SplitterForClassification
splitter = SplitterForClassification(
 "nateraw/bert-base-uncased-emotion",
 batch_size=32,
 device_map="cuda",
)
dataset = datasets.load_dataset("dair-ai/emotion", "split")["train"]["text"]
activations, predictions = splitter.get_activations(dataset)
  • Introduce SplitterForGeneration, which requires a split_point but forces TOKEN granularity (though you can still include special tokens that are not padding).
from interpreto import SplitterForGeneration
splitter = SplitterForGeneration(
 "gpt2",
 split_point=10,
 batch_size=8,
 device_map="auto",
)
activations, _ = splitter.get_activations(
 ["Hello world!", "Interpreto is magic"],
)

Inputs to concepts attributions #150

Thanks to the new SplitterForClassification, we can now use the attribution method to see which input elements were relevant for given concepts. (This does not work and is less relevant for the token-level concepts we have in generation.)

To make it work, just pass concept_explainer.get_inputs_to_concepts_model() to you prefered attribution method.

Example running in less than 40s on a L40S (46Go)

import datasets
from interpreto import Occlusion, SplitterForClassification, plot_concepts
from interpreto.concepts import SemiNMFConcepts
from interpreto.concepts.interpretations import TopKInputs
# --------------------------------------------------------------------------------------------------
# 1. Split model and get activations
splitter = SplitterForClassification(
 "nateraw/bert-base-uncased-emotion",
 batch_size=256,
 device_map="cuda",
)
train = datasets.load_dataset("dair-ai/emotion", "split")["train"]
inputs = train["text"]
classes_names = train.features["label"].names
activations, predictions = splitter.get_activations(inputs)
# --------------------------------------------------------------------------------------------------
# 2. Extract concepts, interpret them, and measure their importance
concept_explainer = SemiNMFConcepts(splitter, nb_concepts=20)
# extract concepts from activations
concept_explainer.fit(activations)
# interpret concepts with topk inputs
interpretation_method = TopKInputs(
 concept_explainer=concept_explainer,
 use_unique_words=3, # ngrams up to 3 words
 unique_words_kwargs={"count_min_threshold": round(len(inputs) * 0.003)}, # appears in at least 0.3% of the dataset
)
interpretations = interpretation_method.interpret(
 inputs=inputs,
 concepts_indices="all",
)
interpretations = [list(words.keys()) for concept_id, words in interpretations.items()]
# estimate concepts importance
gradients = concept_explainer.concept_output_gradient(
 inputs=activations, # skips the inputs to activations part
 targets=None, # all classes
 batch_size=64,
)
# --------------------------------------------------------------------------------------------------
# 3. Inputs to concepts attribution
sample_id = 0
attributions_explainer = Occlusion(
 concept_explainer.get_inputs_to_concepts_model(),
 splitter.tokenizer,
 batch_size=256,
)
results = attributions_explainer.explain(
 inputs[sample_id],
 targets=None # explain all concepts
)[0]
# --------------------------------------------------------------------------------------------------
# 4. Visualize the whole thing
plot_concepts(
 sample=results.elements,
 classes_names=classes_names,
 concepts_activations=results.attributions.T,
 concepts_importances=gradients[sample_id].squeeze()[predictions[sample_id]], # (num_classes, num_concepts)
 concepts_labels=interpretations,
)
image

Probing (supervised post-hoc concepts) #153

This release also includes probes, a.k.a CAVs, a.k.a post-hoc supervised concept-based explanations. They are simple classification models trained to predict whether a concept is present in the model's activations. Therefore, they require concept labels. They answer two questions:

  • Is the concept present in the model (with probe performance)
  • Is the concept present in a sample (probe prediction on the sample's activations)

They use the same splitter and fit API as their unsupervised counterpart. But they do not require interpretations (even though it is possible) and the concepts_to_outputs do not work for them (this would correspond to the testing with CAVs).

There is a large diversity:

from interpreto.concepts import LinearRegressionProbe, ProbeExplainer
# Choose a probe and its parameter
probe = LinearRegressionProbe()
# Wrap it to link wi...
Read more

Contributors

AntoninPoche and HugoDeBosschere
Assets 2
Loading

v0.4.20 - Fixes, ngrams, and sanity checks

20 Mar 16:22
@github-actions github-actions

Choose a tag to compare

What’s Changed

0.4.18

0.4.19

  • We can now interpret concepts via top-k ngrams and not just top-k words. Just set use_unique_words=3 for top-k 3-grams. (#134) @camillebrl

0.4.20

👥 List of contributors

@AntoninPoche, @camillebrl, @fanny-jourdan

Welcome to our new contributor @camillebrl 🤗

Contributors

camillebrl, AntoninPoche, and fanny-jourdan
Loading

v0.4.17 - Update Granularity

03 Mar 16:57
@github-actions github-actions
b90adec
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

What’s Changed

This includes:

  • Modification of sentence granularity to remove dependency on Scipy
  • Added sentence part granularity, splitting the input into separate parts of the sentence separated by: ".", "?", "!", ",", ":"
  • Added more complex test to verify granularity robustness.

👥 List of contributors

@fanny-jourdan

Contributors

fanny-jourdan
Loading

v0.4.16 - New visualization and website

16 Feb 17:57
@github-actions github-actions

Choose a tag to compare

What’s Changed

👥 List of contributors

@AntoninPoche and @gfouilhe

Contributors

gfouilhe and AntoninPoche
Loading

v0.4.15 Interpreto official release

20 Jan 09:12
@AntoninPoche AntoninPoche

Choose a tag to compare

Interpreto is officially released

From this version onward, release notes will describe the changes made to the library. For now, this release note briefly describes what is included in interpreto, but it is best to check the documentation and tutorials for more details.

Position

This library provides interpretability tools for language models from HuggingFace, for both sequence classification and causal generation.

There are two main modules, along with metrics and visualization tools:

Attributions

interpreto implements both perturbation-based and gradient-based methods. Users can set the granularity of the attribution, from special tokens to sentences, including normal tokens and words.

There are two metrics: insertion and deletion.

Concept-based

To obtain concept-based explanations (post-hoc, unsupervised), there are several steps. Interpreto decomposes its pipeline according to these steps:

  1. Split a model in two and compute a dataset of activations with ModelWithSplitPoints, based on nnsight.
  2. Find patterns in these activations via dictionary learning; we implement ~15 methods by wrapping overcomplete.
  3. Interpret the concepts, from simple top-k vocabulary tokens to LLM labeling of concepts.
  4. Estimate the contribution of each concept to the prediction.
  5. Evaluate the previous steps with diverse metrics.
Loading
lucashervier reacted with thumbs up emoji
1 person reacted

AltStyle によって変換されたページ (->オリジナル) /