parlai.core.metrics

Provides standard metric evaluations for dialog, as well as an aggregator.

classparlai.core.metrics.MetricDisplayData(title, description)[source]

Bases: NamedTuple

title:str

Alias for field number 0

description:str

Alias for field number 1

classparlai.core.metrics.Metric[source]

Bases: ABC

Base class for storing metrics.

Subclasses should define .value(). Examples are provided for each subclass.

propertyis_global:bool

Indicates whether this metric should be reported globally or per-task.

propertymacro_average:bool

Indicates whether this metric should be macro-averaged when globally reported.

abstractvalue() float[source]

Return the value of the metric as a float.

classmethodmany(*objs:List[Union[List[Union[int,float,Tensor]],Tensor]]) List[Metric ][source]

Construct many of a Metric from the base parts.

Useful if you separately compute numerators and denominators, etc.

classmethodfrom_mask(metric_per_token:Tensor, mask:Tensor) List[Metric ][source]

From token-level metrics, returns an aggregate MyMetric per example in the batch.

Parameters
  • metric_per_token – a (batchsize x num_tokens) Tensor

  • mask – a (batchsize x num_tokens) Tensor to mask out tokens that should not be considered in the aggregate metric calculation.

Returns

a (batchsize) Tensor

classparlai.core.metrics.FixedMetric(value:Union[int,float,Tensor])[source]

Bases: Metric

Fixed metrics are verified to be the same when combined, or throw an error.

FixedMetric is used for things like total_train_updates, which should not be combined across different multitasks or different workers.

__init__(value:Union[int,float,Tensor])[source]
value() float[source]

Return the value of the metric as a float.

classparlai.core.metrics.SumMetric(sum_:Union[int,float,Tensor]=0)[source]

Bases: Metric

Class that keeps a running sum of some metric.

Examples of SumMetric include things like "exs", the number of examples seen since the last report, which depends exactly on a teacher.

__init__(sum_:Union[int,float,Tensor]=0)[source]
value() float[source]

Return the value of the metric as a float.

classparlai.core.metrics.AverageMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: Metric

Class that keeps a running average of some metric.

Examples of AverageMetrics include hits@1, F1, accuracy, etc. These metrics all have per-example values that can be directly mapped back to a teacher.

propertymacro_average:bool

Indicates whether this metric should be macro-averaged when globally reported.

__init__(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]
value() float[source]

Return the value of the metric as a float.

classparlai.core.metrics.MacroAverageMetric(metrics:Dict[str,Metric ])[source]

Bases: Metric

Class that represents the macro average of several numbers.

Used for aggregating task level metrics. It is only used for things that are AverageMetrics already.

__init__(metrics:Dict[str,Metric ]) None[source]
value() float[source]

Return the value of the metric as a float.

classparlai.core.metrics.TimerMetric(value:Union[int,float,Tensor], start_time:Optional[float]=None, end_time:Optional[float]=None)[source]

Bases: Metric

A timer metric keep tracks of the first/last times it was used.

__init__(value:Union[int,float,Tensor], start_time:Optional[float]=None, end_time:Optional[float]=None)[source]
value() float[source]

Return the value of the metric as a float.

classparlai.core.metrics.GlobalMetric[source]

Bases: object

A global metric is one that should not be aggregated across different tasks.

Examples of global metric include things like learning rate and updates. These need to be accumulated or averaged over multiple parleys, but cannot be correlated with a single task.

Key to it is the notion that any one worker or any one task already has a global view of the value, and so no combinations should be done. Note this is different then a FixedMetric, in that a GlobalMetric can be still averaged across multiple parleys(), but a FixedMetric is always fixed.

classparlai.core.metrics.GlobalFixedMetric(value:Union[int,float,Tensor])[source]

Bases: GlobalMetric, FixedMetric

Global fixed metric.

Used for things like total_train_updates.

classparlai.core.metrics.GlobalSumMetric(sum_:Union[int,float,Tensor]=0)[source]

Bases: GlobalMetric, SumMetric

Global sum metric.

Used for ‘exs’ and ‘updates’.

classparlai.core.metrics.GlobalAverageMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: GlobalMetric, AverageMetric

Global Average metric.

Used for things like learning rate, and many agent-specific metrics.

classparlai.core.metrics.LegacyMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: GlobalAverageMetric

Legacy Metrics are reported by agent as float.

classparlai.core.metrics.GlobalTimerMetric(value:Union[int,float,Tensor], start_time:Optional[float]=None, end_time:Optional[float]=None)[source]

Bases: GlobalMetric, TimerMetric

classparlai.core.metrics.F1Metric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: AverageMetric

Helper class which computes token-level F1.

classparlai.core.metrics.ExactMatchMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: AverageMetric

classparlai.core.metrics.BleuMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: AverageMetric

staticcompute(guess:str, answers:List[str], k:int=4) Optional[BleuMetric ][source]

Compute approximate BLEU score between guess and a set of answers.

classparlai.core.metrics.FairseqBleuMetric(pred:Union[Tensor,List[int]], ref:Union[Tensor,List[int]], pad_idx:int, eos_idx:int, unk_idx:int, order:int)[source]

Bases: Metric

Re-implementation of https://github.com/pytorch/fairseq/blob/main/fairseq/scoring/bleu.py.

__init__(pred:Union[Tensor,List[int]], ref:Union[Tensor,List[int]], pad_idx:int, eos_idx:int, unk_idx:int, order:int)[source]
propertymacro_average:bool

Indicates whether this metric should be macro-averaged when globally reported.

value() float[source]

Reimplementation of Fairseq’s score.

staticcompute_many(guess:Tensor, answers:Tensor, pad_idx, end_idx, unk_idx)[source]

Return BLEU-1..4 using fairseq and tokens.

classparlai.core.metrics.RougeMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: AverageMetric

staticcompute_many(guess:str, answers:List[str], measure:str='r') Tuple[Optional[RougeMetric ],Optional[RougeMetric ],Optional[RougeMetric ]][source]

Compute ROUGE score between guess and any answer.

Done with compute_many due to increased efficiency.

Returns

(rouge-1, rouge-2, rouge-L)

classparlai.core.metrics.IntraDistinctMetric(numer:Union[int,float,Tensor], denom:Union[int,float,Tensor]=1)[source]

Bases: AverageMetric

Compute intra-distinct (per-utterance).

classmethodcompute(text:str, ngram:int=1)[source]
Parameters
  • text – The text to compute metric over

  • ngram – n-gram length

classparlai.core.metrics.InterDistinctMetric(counts:Counter[Tuple])[source]

Bases: Metric

Compute inter-distinct metric over corpus-level.

__init__(counts:Counter[Tuple])[source]
Parameters

counts – collections.Counter of ngram -> frequency

value()[source]

Return the value of the metric as a float.

parlai.core.metrics.normalize_answer(s)[source]

Lower text and remove punctuation, articles and extra whitespace.

parlai.core.metrics.aggregate_named_reports(named_reports:Dict[str,Dict[str,Metric ]], micro_average:bool=False) Dict[str,Metric ][source]

Aggregate metrics from multiple reports.

Parameters
  • reports – Dict of tasks -> metrics.

  • micro_average – If true, top level metrics will be the micro average. By default, we use macro average.

Returns

The aggregated report

parlai.core.metrics.aggregate_unnamed_reports(reports:List[Dict[str,Metric ]]) Dict[str,Metric ][source]

Combines metrics without regard for tracking provenence.

classparlai.core.metrics.Metrics(threadsafe=False, shared=None)[source]

Bases: object

Metrics aggregator.

__init__(threadsafe=False, shared=None)[source]
add(key:str, value:Optional[Metric ]) None[source]

Record an accumulation to a metric.

report()[source]

Report the metrics over all data seen so far.

clear_recent()[source]

Clear recent metrics (latest example).

report_recent()[source]

Report recent metrics (latest example).

clear()[source]

Clear all the metrics.

add_metrics(other:Metrics ) None[source]

Aggregate another Metrics objects metrics into this one.

Note that it is assumed that the keys for metrics are disjoint between Metrics objects.

classparlai.core.metrics.TeacherMetrics(metrics_list:str='default', shared:Optional[Dict[str,Any]]=None)[source]

Bases: Metrics

Helper container which encapsulates standard metrics (F1, BLEU, ...).

__init__(metrics_list:str='default', shared:Optional[Dict[str,Any]]=None) None[source]
evaluate_response(observation:Message , labels:List[str]) None[source]

Compute all required text-based metrics based on an observation and labels.