Releases: FOR-sight-ai/interpreto

v0.5.0 Refacto Attributions and Concepts

22 Jun 23:57

@github-actions github-actions

v0.5.0

b8228f9

v0.5.0 Refacto Attributions and Concepts Latest

Latest

Summary

fixing the 147 issue (#148) @HugoDeBosschere
InferenceWrapper rewrite and attribution device optimization (#142) @AntoninPoche
Concepts API simplification (#156) @AntoninPoche
New simpler splitters SplitterForClassification (#150) and (SplitterForGeneration) (#156) @AntoninPoche
Inputs to concepts attributions for classification (#150) @AntoninPoche
Probes (supervised post-hoc concept-based explanations) (#153) @AntoninPoche

💥 Breaking changes (concepts simplification)

Attributions do not support BatchEncodings of multiple inputs anymore.
ModelWithSplitPoints(split_points=[3]) is deprecated, not breaking. Use ModelWithSplitPoints(split_point=3) instead.
AnythingElse.split_point or AnythingElse(split_point=...) will raise an error. The splitters are the only ones managing the split point.
get_activations() output is not a dict but a tuple now.
splitters do not have a get_split_activations() method anymore. It is not necessary; activations are the first element of the tuple.

# Old and broken
activations_dict = splitter.get_activations(inputs)
activations = splitter.get_split_activationns
# New
activations, _ = splitter.get_activations(inputs)

from interpreto.model_wrapping import ModelWithSplitPoints is not the correct path anymore. One should use from interpreto import ModelWithSplitPoints which was already the recommended way.
Similarly, from interpreto.model_wrapping.llm_interface import OpenAILLM is not the correct path anymore. One should use from interpreto.commons.llm_interface import OpenAILLM.
encode_activations() and decode_concepts() methods from concept explainers have been renamed to activations_to_concepts() and concepts_to_activations(). But these are mainly internal to the library.
Interpretation methods do not have a concept_model_device anymore. But it should not change much, since the concept model is usually already on the device.

# Old and broken
interpretation_method = TopKInputs(concept_explainer, concept_model_device="cuda")
# New
concept_explainer.to("cuda") # only moves the `.concept_model`, not the `.splitter`
interpretation_method = TopKInputs(concept_explainer)

Please refer to the details and examples of the release note or notebooks for the new concepts API.

Details and Examples

Attributions inference wrapping #142

This is mainly an internal update without impact on the API. However, it impacts the behavior slightly:

Better device management between perturbation, inference, and aggregation
Simplification of the inference architecture, which solves issues #96, # 108, and #127

Simplify the concepts API #156

The ModelWithSplitPoints was versatile but complex. So we simplified it a bit itself and even further in the newsplitters.

Limit to a single split point, so only the splitter has and needs this information.
Pass get_activations output to a tuple activations, predictions, thus removing the need for get_spit_activations.
Rename all model_with_split_points attributes to splitter, it is shorter and aligns with the new splitters.
Force nnsight>=0.7,<0.8
Introduce the inputs_to_activations, inputs_to_concepts, activations_to_concepts, and concepts_to_activations naming convention. (some are renaming). These functions do not include batching and can be easily used externally.

New splitters #150 & #156

Introduce SplitterForClassification, which autodetects the classification head and splits between the model and the head. Thus, we can consider it forces the CLS_TOKEN granularity.

import datasets
from interpreto import SplitterForClassification
splitter = SplitterForClassification(
 "nateraw/bert-base-uncased-emotion",
 batch_size=32,
 device_map="cuda",
)
dataset = datasets.load_dataset("dair-ai/emotion", "split")["train"]["text"]
activations, predictions = splitter.get_activations(dataset)

Introduce SplitterForGeneration, which requires a split_point but forces TOKEN granularity (though you can still include special tokens that are not padding).

from interpreto import SplitterForGeneration
splitter = SplitterForGeneration(
 "gpt2",
 split_point=10,
 batch_size=8,
 device_map="auto",
)
activations, _ = splitter.get_activations(
 ["Hello world!", "Interpreto is magic"],
)

Inputs to concepts attributions #150

Thanks to the new SplitterForClassification, we can now use the attribution method to see which input elements were relevant for given concepts. (This does not work and is less relevant for the token-level concepts we have in generation.)

To make it work, just pass concept_explainer.get_inputs_to_concepts_model() to you prefered attribution method.

Example running in less than 40s on a L40S (46Go)

import datasets
from interpreto import Occlusion, SplitterForClassification, plot_concepts
from interpreto.concepts import SemiNMFConcepts
from interpreto.concepts.interpretations import TopKInputs
# --------------------------------------------------------------------------------------------------
# 1. Split model and get activations
splitter = SplitterForClassification(
 "nateraw/bert-base-uncased-emotion",
 batch_size=256,
 device_map="cuda",
)
train = datasets.load_dataset("dair-ai/emotion", "split")["train"]
inputs = train["text"]
classes_names = train.features["label"].names
activations, predictions = splitter.get_activations(inputs)
# --------------------------------------------------------------------------------------------------
# 2. Extract concepts, interpret them, and measure their importance
concept_explainer = SemiNMFConcepts(splitter, nb_concepts=20)
# extract concepts from activations
concept_explainer.fit(activations)
# interpret concepts with topk inputs
interpretation_method = TopKInputs(
 concept_explainer=concept_explainer,
 use_unique_words=3, # ngrams up to 3 words
 unique_words_kwargs={"count_min_threshold": round(len(inputs) * 0.003)}, # appears in at least 0.3% of the dataset
)
interpretations = interpretation_method.interpret(
 inputs=inputs,
 concepts_indices="all",
)
interpretations = [list(words.keys()) for concept_id, words in interpretations.items()]
# estimate concepts importance
gradients = concept_explainer.concept_output_gradient(
 inputs=activations, # skips the inputs to activations part
 targets=None, # all classes
 batch_size=64,
)
# --------------------------------------------------------------------------------------------------
# 3. Inputs to concepts attribution
sample_id = 0
attributions_explainer = Occlusion(
 concept_explainer.get_inputs_to_concepts_model(),
 splitter.tokenizer,
 batch_size=256,
)
results = attributions_explainer.explain(
 inputs[sample_id],
 targets=None # explain all concepts
)[0]
# --------------------------------------------------------------------------------------------------
# 4. Visualize the whole thing
plot_concepts(
 sample=results.elements,
 classes_names=classes_names,
 concepts_activations=results.attributions.T,
 concepts_importances=gradients[sample_id].squeeze()[predictions[sample_id]], # (num_classes, num_concepts)
 concepts_labels=interpretations,
)

image

Probing (supervised post-hoc concepts) #153

This release also includes probes, a.k.a CAVs, a.k.a post-hoc supervised concept-based explanations. They are simple classification models trained to predict whether a concept is present in the model's activations. Therefore, they require concept labels. They answer two questions:

Is the concept present in the model (with probe performance)
Is the concept present in a sample (probe prediction on the sample's activations)

They use the same splitter and fit API as their unsupervised counterpart. But they do not require interpretations (even though it is possible) and the concepts_to_outputs do not work for them (this would correspond to the testing with CAVs).

There is a large diversity:

Linear probes: LinearRegressionProbe, LogisticRegressionProbe, LinearSVMProbe, MeansDiffProbe
Centroid-based probes: CosineCentroidProbe, DotProductCentroidProbe, SqL2CentroidProbe, SVDDCentroidProbe, DiagonalMahalanobisCentroidProbe
Normalizations: Standardization, Whitening
Bias calibrators: bce_bias, fpr_bias, prevalence_bias, lda_shared_var_bias, midpoint_bias

from interpreto.concepts import LinearRegressionProbe, ProbeExplainer
# Choose a probe and its parameter
probe = LinearRegressionProbe()
# Wrap it to link wi...

Contributors

AntoninPoche and HugoDeBosschere

Assets 2

v0.4.20 - Fixes, ngrams, and sanity checks

20 Mar 16:22

@github-actions github-actions

v0.4.20

563614f

v0.4.20 - Fixes, ngrams, and sanity checks

What’s Changed

0.4.18

Require nnsight<0.6.0, to prevent compatibility issues (#135) @AntoninPoche

0.4.19

We can now interpret concepts via top-k ngrams and not just top-k words. Just set use_unique_words=3 for top-k 3-grams. (#134) @camillebrl

0.4.20

Fix bug for word and sentence granularity (#133) @fanny-jourdan
Fix issue #137 by preventing unecessary model resizing (#138) @AntoninPoche
Add sanity checks and fix Sobol (#138) @AntoninPoche

👥 List of contributors

@AntoninPoche, @camillebrl, @fanny-jourdan

Welcome to our new contributor @camillebrl 🤗

Contributors

camillebrl, AntoninPoche, and fanny-jourdan

Assets 2

v0.4.17 - Update Granularity

03 Mar 16:57

@github-actions github-actions

v0.4.17

b90adec

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

v0.4.17 - Update Granularity

What’s Changed

Fix sentence granularity (#128) @fanny-jourdan

This includes:

Modification of sentence granularity to remove dependency on Scipy
Added sentence part granularity, splitting the input into separate parts of the sentence separated by: ".", "?", "!", ",", ":"
Added more complex test to verify granularity robustness.

👥 List of contributors

@fanny-jourdan

Contributors

@fanny-jourdan

fanny-jourdan

Assets 2

v0.4.16 - New visualization and website

16 Feb 17:57

@github-actions github-actions

v0.4.16

3e33b49

v0.4.16 - New visualization and website

What’s Changed

Developed an explanation gallery website
Update attribution visualizations (#125) @AntoninPoche
Introduce visualizations for concepts(#125) @AntoninPoche
Fix links of tutorials in readme and doc (#126) @gfouilhe
Attribution walkthrough: fix and add metrics (#121) @AntoninPoche

👥 List of contributors

@AntoninPoche and @gfouilhe

Contributors

gfouilhe and AntoninPoche

Assets 2

v0.4.15 Interpreto official release

20 Jan 09:12

@AntoninPoche AntoninPoche

v0.4.15

71ddc62

v0.4.15 Interpreto official release

Interpreto is officially released

From this version onward, release notes will describe the changes made to the library. For now, this release note briefly describes what is included in interpreto, but it is best to check the documentation and tutorials for more details.

Position

This library provides interpretability tools for language models from HuggingFace, for both sequence classification and causal generation.

There are two main modules, along with metrics and visualization tools:

Attributions

interpreto implements both perturbation-based and gradient-based methods. Users can set the granularity of the attribution, from special tokens to sentences, including normal tokens and words.

There are two metrics: insertion and deletion.

Concept-based

To obtain concept-based explanations (post-hoc, unsupervised), there are several steps. Interpreto decomposes its pipeline according to these steps:

Split a model in two and compute a dataset of activations with ModelWithSplitPoints, based on nnsight.
Find patterns in these activations via dictionary learning; we implement ~15 methods by wrapping overcomplete.
Interpret the concepts, from simple top-k vocabulary tokens to LLM labeling of concepts.
Estimate the contribution of each concept to the prediction.
Evaluate the previous steps with diverse metrics.

Assets 2

1 person reacted

Releases: FOR-sight-ai/interpreto

v0.5.0 Refacto Attributions and Concepts

Summary

💥 Breaking changes (concepts simplification)

Details and Examples

Attributions inference wrapping #142

Simplify the concepts API #156

New splitters #150 & #156

Inputs to concepts attributions #150

Probing (supervised post-hoc concepts) #153

Contributors

Uh oh!

v0.4.20 - Fixes, ngrams, and sanity checks

What’s Changed

0.4.18

0.4.19

0.4.20

👥 List of contributors

Contributors

Uh oh!

v0.4.17 - Update Granularity

What’s Changed

👥 List of contributors

Contributors

Uh oh!

v0.4.16 - New visualization and website

What’s Changed

👥 List of contributors

Contributors

Uh oh!

v0.4.15 Interpreto official release

Interpreto is officially released

Position

Attributions

Concept-based

Uh oh!