-
Notifications
You must be signed in to change notification settings - Fork 5
Releases: FOR-sight-ai/interpreto
v0.5.0 Refacto Attributions and Concepts
Summary
- fixing the 147 issue (#148) @HugoDeBosschere
InferenceWrapperrewrite and attribution device optimization (#142) @AntoninPoche- Concepts API simplification (#156) @AntoninPoche
- New simpler splitters
SplitterForClassification(#150) and (SplitterForGeneration) (#156) @AntoninPoche - Inputs to concepts attributions for classification (#150) @AntoninPoche
- Probes (supervised post-hoc concept-based explanations) (#153) @AntoninPoche
💥 Breaking changes (concepts simplification)
- Attributions do not support
BatchEncodingsof multiple inputs anymore. ModelWithSplitPoints(split_points=[3])is deprecated, not breaking. UseModelWithSplitPoints(split_point=3)instead.AnythingElse.split_pointorAnythingElse(split_point=...)will raise an error. The splitters are the only ones managing the split point.get_activations()output is not a dict but a tuple now.- splitters do not have a
get_split_activations()method anymore. It is not necessary; activations are the first element of the tuple.
# Old and broken activations_dict = splitter.get_activations(inputs) activations = splitter.get_split_activationns # New activations, _ = splitter.get_activations(inputs)
from interpreto.model_wrapping import ModelWithSplitPointsis not the correct path anymore. One should usefrom interpreto import ModelWithSplitPointswhich was already the recommended way.- Similarly,
from interpreto.model_wrapping.llm_interface import OpenAILLMis not the correct path anymore. One should usefrom interpreto.commons.llm_interface import OpenAILLM. encode_activations()anddecode_concepts()methods from concept explainers have been renamed toactivations_to_concepts()andconcepts_to_activations(). But these are mainly internal to the library.- Interpretation methods do not have a
concept_model_deviceanymore. But it should not change much, since the concept model is usually already on the device.
# Old and broken interpretation_method = TopKInputs(concept_explainer, concept_model_device="cuda") # New concept_explainer.to("cuda") # only moves the `.concept_model`, not the `.splitter` interpretation_method = TopKInputs(concept_explainer)
Please refer to the details and examples of the release note or notebooks for the new concepts API.
Details and Examples
Attributions inference wrapping #142
This is mainly an internal update without impact on the API. However, it impacts the behavior slightly:
- Better device management between perturbation, inference, and aggregation
- Simplification of the inference architecture, which solves issues #96, # 108, and #127
Simplify the concepts API #156
The ModelWithSplitPoints was versatile but complex. So we simplified it a bit itself and even further in the newsplitters.
- Limit to a single split point, so only the splitter has and needs this information.
- Pass
get_activationsoutput to a tupleactivations, predictions, thus removing the need forget_spit_activations. - Rename all
model_with_split_pointsattributes tosplitter, it is shorter and aligns with the new splitters. - Force
nnsight>=0.7,<0.8 - Introduce the
inputs_to_activations,inputs_to_concepts,activations_to_concepts, andconcepts_to_activationsnaming convention. (some are renaming). These functions do not include batching and can be easily used externally.
New splitters #150 & #156
- Introduce
SplitterForClassification, which autodetects the classification head and splits between the model and the head. Thus, we can consider it forces theCLS_TOKENgranularity.
import datasets from interpreto import SplitterForClassification splitter = SplitterForClassification( "nateraw/bert-base-uncased-emotion", batch_size=32, device_map="cuda", ) dataset = datasets.load_dataset("dair-ai/emotion", "split")["train"]["text"] activations, predictions = splitter.get_activations(dataset)
- Introduce
SplitterForGeneration, which requires a split_point but forcesTOKENgranularity (though you can still include special tokens that are not padding).
from interpreto import SplitterForGeneration splitter = SplitterForGeneration( "gpt2", split_point=10, batch_size=8, device_map="auto", ) activations, _ = splitter.get_activations( ["Hello world!", "Interpreto is magic"], )
Inputs to concepts attributions #150
Thanks to the new SplitterForClassification, we can now use the attribution method to see which input elements were relevant for given concepts. (This does not work and is less relevant for the token-level concepts we have in generation.)
To make it work, just pass concept_explainer.get_inputs_to_concepts_model() to you prefered attribution method.
Example running in less than 40s on a L40S (46Go)
import datasets from interpreto import Occlusion, SplitterForClassification, plot_concepts from interpreto.concepts import SemiNMFConcepts from interpreto.concepts.interpretations import TopKInputs # -------------------------------------------------------------------------------------------------- # 1. Split model and get activations splitter = SplitterForClassification( "nateraw/bert-base-uncased-emotion", batch_size=256, device_map="cuda", ) train = datasets.load_dataset("dair-ai/emotion", "split")["train"] inputs = train["text"] classes_names = train.features["label"].names activations, predictions = splitter.get_activations(inputs) # -------------------------------------------------------------------------------------------------- # 2. Extract concepts, interpret them, and measure their importance concept_explainer = SemiNMFConcepts(splitter, nb_concepts=20) # extract concepts from activations concept_explainer.fit(activations) # interpret concepts with topk inputs interpretation_method = TopKInputs( concept_explainer=concept_explainer, use_unique_words=3, # ngrams up to 3 words unique_words_kwargs={"count_min_threshold": round(len(inputs) * 0.003)}, # appears in at least 0.3% of the dataset ) interpretations = interpretation_method.interpret( inputs=inputs, concepts_indices="all", ) interpretations = [list(words.keys()) for concept_id, words in interpretations.items()] # estimate concepts importance gradients = concept_explainer.concept_output_gradient( inputs=activations, # skips the inputs to activations part targets=None, # all classes batch_size=64, ) # -------------------------------------------------------------------------------------------------- # 3. Inputs to concepts attribution sample_id = 0 attributions_explainer = Occlusion( concept_explainer.get_inputs_to_concepts_model(), splitter.tokenizer, batch_size=256, ) results = attributions_explainer.explain( inputs[sample_id], targets=None # explain all concepts )[0] # -------------------------------------------------------------------------------------------------- # 4. Visualize the whole thing plot_concepts( sample=results.elements, classes_names=classes_names, concepts_activations=results.attributions.T, concepts_importances=gradients[sample_id].squeeze()[predictions[sample_id]], # (num_classes, num_concepts) concepts_labels=interpretations, )
Probing (supervised post-hoc concepts) #153
This release also includes probes, a.k.a CAVs, a.k.a post-hoc supervised concept-based explanations. They are simple classification models trained to predict whether a concept is present in the model's activations. Therefore, they require concept labels. They answer two questions:
- Is the concept present in the model (with probe performance)
- Is the concept present in a sample (probe prediction on the sample's activations)
They use the same splitter and fit API as their unsupervised counterpart. But they do not require interpretations (even though it is possible) and the concepts_to_outputs do not work for them (this would correspond to the testing with CAVs).
There is a large diversity:
- Linear probes: LinearRegressionProbe, LogisticRegressionProbe, LinearSVMProbe, MeansDiffProbe
- Centroid-based probes: CosineCentroidProbe, DotProductCentroidProbe, SqL2CentroidProbe, SVDDCentroidProbe, DiagonalMahalanobisCentroidProbe
- Normalizations:
Standardization,Whitening - Bias calibrators:
bce_bias,fpr_bias,prevalence_bias,lda_shared_var_bias,midpoint_bias
from interpreto.concepts import LinearRegressionProbe, ProbeExplainer # Choose a probe and its parameter probe = LinearRegressionProbe() # Wrap it to link wi...
Assets 2
v0.4.20 - Fixes, ngrams, and sanity checks
What’s Changed
0.4.18
- Require nnsight<0.6.0, to prevent compatibility issues (#135) @AntoninPoche
0.4.19
- We can now interpret concepts via top-k ngrams and not just top-k words. Just set
use_unique_words=3for top-k 3-grams. (#134) @camillebrl
0.4.20
- Fix bug for word and sentence granularity (#133) @fanny-jourdan
- Fix issue #137 by preventing unecessary model resizing (#138) @AntoninPoche
- Add sanity checks and fix Sobol (#138) @AntoninPoche
👥 List of contributors
@AntoninPoche, @camillebrl, @fanny-jourdan
Welcome to our new contributor @camillebrl 🤗
Assets 2
v0.4.17 - Update Granularity
b90adec What’s Changed
- Fix sentence granularity (#128) @fanny-jourdan
This includes:
- Modification of sentence granularity to remove dependency on Scipy
- Added sentence part granularity, splitting the input into separate parts of the sentence separated by: ".", "?", "!", ",", ":"
- Added more complex test to verify granularity robustness.
👥 List of contributors
Assets 2
v0.4.16 - New visualization and website
What’s Changed
- Developed an explanation gallery website
- Update attribution visualizations (#125) @AntoninPoche
- Introduce visualizations for concepts(#125) @AntoninPoche
- Fix links of tutorials in readme and doc (#126) @gfouilhe
- Attribution walkthrough: fix and add metrics (#121) @AntoninPoche
👥 List of contributors
Assets 2
v0.4.15 Interpreto official release
Interpreto is officially released
From this version onward, release notes will describe the changes made to the library. For now, this release note briefly describes what is included in interpreto, but it is best to check the documentation and tutorials for more details.
Position
This library provides interpretability tools for language models from HuggingFace, for both sequence classification and causal generation.
There are two main modules, along with metrics and visualization tools:
Attributions
interpreto implements both perturbation-based and gradient-based methods. Users can set the granularity of the attribution, from special tokens to sentences, including normal tokens and words.
There are two metrics: insertion and deletion.
Concept-based
To obtain concept-based explanations (post-hoc, unsupervised), there are several steps. Interpreto decomposes its pipeline according to these steps:
- Split a model in two and compute a dataset of activations with
ModelWithSplitPoints, based onnnsight. - Find patterns in these activations via dictionary learning; we implement ~15 methods by wrapping
overcomplete. - Interpret the concepts, from simple top-k vocabulary tokens to LLM labeling of concepts.
- Estimate the contribution of each concept to the prediction.
- Evaluate the previous steps with diverse metrics.