Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

MichJoM/PartisanLens

Repository files navigation

๐Ÿ•ต๏ธโ€โ™‚๏ธ PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media

PartisanLens is a dataset focused on hyperpartisanship, stance detection, and PRCT, featuring human-authored rationales and detailed annotations.


๐Ÿ“ Repository Structure

partisanlens/
โ”‚
โ”œโ”€โ”€ data/ ๐Ÿ“ฆ Dataset, keywords & rationales
โ”œโ”€โ”€ data_curation/ ๐Ÿงช Data sampling, statistics, and analysis scripts
โ”‚ โ”œโ”€โ”€ analysis/ ๐Ÿ“Š Data analysis scripts
โ”‚ โ””โ”€โ”€ DPP_extraction.py
โ”œโ”€โ”€ experiments/ ๐Ÿง  Model training, inference, rationale generation
โ”‚ โ”œโ”€โ”€ build-templated-rationales.py
โ”‚ โ”œโ”€โ”€ rephrase-rationales.py
โ”‚ โ”œโ”€โ”€ inference.py
โ”‚ โ””โ”€โ”€ finetune.py
โ””โ”€โ”€ annotation_guidelines.pdf ๐Ÿ“„ Annotation schema and instructions

๐Ÿ“Œ Dataset Overview

PartisanLens includes:

  • ๐Ÿ”ด๐Ÿ”ต Hyperpartisan annotations โ€“ identifying overtly partisan language
  • ๐Ÿงญ Stance detection โ€“ determining whether the speaker is pro, against, or neutral towards immigration
  • ๐Ÿง  PRCT labels โ€“ Population Replacement Conspiracy Theories

Each sample contains:

  • A political text segment
  • Task-specific labels (hyperpartisan, stance, PRCT)
  • Span annotation (loaded language, name calling and appeal to fear)

๐Ÿ”ฌ Experiments

We provide Python scripts to explore how LLMs and finetuned models handle reasoning with rationales.

Module Description
๐Ÿงฑ build-templated-rationales.py Automatically build templated rationales from the span annotation
โœ๏ธ rephrase-rationales.py Rephrase or augment rationales using LLMs for more fluente and natural language explanations
๐Ÿค– inference.py Perform zero-shot or few-shot inference using LLMs
๐ŸŽฏ finetune.py Finetune models with (or without) rationale supervision

โœ๏ธ Rephrasing Rationales โ€” rephrase-rationales.py

This script uses a LLM to rephrase and enrich templated rationales for each instance in the dataset, while preserving the original task labels. The output is a step-by-step explanation in JSON format for each example.

๐Ÿ”ง How to Run

python3 experiments/rephrase-rationales.py \
 --dataset data/train_templated_rationales.csv \
 --output data/train_rephrased_rationales.csv \
 --hf_token your_huggingface_token

๐Ÿ”ง Arguments

Argument Type Required Description
--dataset str โœ… Yes Path to the input dataset (.csv or .tsv). Must include columns like id, text, templated_rationales, hyperpartisan_gold_label, prct_gold_label, and stance_gold_label.
--output str โŒ No Path to the output file (.csv). Default: rephrased-rationales.csv.
--hf_token str โŒ No Hugging Face token (used to access gated models from the unsloth hub).

๐Ÿค– Inference โ€” inference.py

This script performs LLM-based inference using zero-shot or few-shot prompting, either to generate rationales and predict labels or only predict labels. You can select different models and modes depending on your use case.

โ–ถ๏ธ How to Run

python3 experiments/inference.py \
 --dataset data/test.csv \
 --model llama3.3-70 \
 --mode rationales \
 --output data/predictions.tsv \
 --hf_token your_huggingface_token

๐Ÿงฉ Modes of Operation

You can choose between two modes when running the script:

Mode Description
rationales ๐Ÿ” Generates natural language rationales (chain-of-thought explanations) for each input sentence.
labels ๐Ÿท๏ธ Directly predicts the classification labels: hyperpartisan, PRCT, and stance โ€” without generating a rationale.

๐Ÿ”ง Arguments

Argument Type Required Description
--dataset str โœ… Yes Path to the input dataset (.csv or .tsv). Must include a text column.
--model str โœ… Yes Model identifier. Must be one of: llama3.1-8b, llama3.3-70, nemo.
--output str โŒ No Path to the output predictions file. Default: rephrased-rationales.csv.
--mode str โŒ No Whether to generate "rationales" or "labels". Default: rationales.
--hf_token str โŒ No Hugging Face token for accessing gated models (e.g., LLaMA-3).

๐Ÿš€ Fine-tuning โ€” finetune.py

Fine-tune a model on the dataset with options for generating either rationales or labels.

python3 finetune.py \
 --dataset data/train.csv \
 --model MODEL_NAME llama3.3-70

๐Ÿ”ง Arguments

Argument Type Required Description
--dataset str โœ… Yes Path to the input dataset (.csv or .tsv) containing the training data. Must include textand label columns.
--model str โœ… Yes Model to fine-tune. Must be one of: llama3.1-8b, llama3.3-70, nemo.
--new_model_name str โŒ No File name/path for saving the fine-tuned model and tokenizer. Default: new-model.
--mode str โŒ No Mode of fine-tuning: "rationales" for explanations or "labels" for only classification labels.
--hf_token str โŒ No Hugging Face token for accessing gated models (e.g., LLaMA-3).

๐Ÿ“Š Data Curation

The data_curation/ directory contains:

  • ๐Ÿ“ˆ Scripts for analyzing dataset composition
  • โš–๏ธ Sampling strategies used the create the dataset
  • ๐Ÿงฎ Statistical reports and visualizations

๐Ÿ“š Annotation Guidelines

Full documentation of tasks, labeling protocols, and rationale-writing instructions are provided in:

๐Ÿ“„ annotation_guidelines.pdf


๐Ÿ’ก Use Cases

  • ๐Ÿง  Interpretability research using rationales
    Use the human-curated / LLM-improved rationales to evaluate and improve model transparency and explainability.

  • ๐Ÿ” Political bias and stance analysis
    Study how models detect hyperpartisan language and take stances toward immigration claims.

  • ๐Ÿค– Fine-tuning models with explanation supervision
    Train models not only to classify but also to generate or use rationales, improving generalization and trustworthiness.


๐Ÿ“ Citation

๐Ÿ“Œ @inproceedings{maggini-etal-2026-partisanlens, title = "{P}artisan{L}ens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in {E}uropean Media", author = "Maggini, Michele Joshua and Piot, Paloma and P{'e}rez, Anxo and Marino, Erik Bran and Montesinos, L{'u}a Santamar{'i}a and Cotovio, Ana Lisboa and Abu{'i}n, Marta V{'a}zquez and Parapar, Javier and Gamallo, Pablo", editor = "Demberg, Vera and Inui, Kentaro and Marquez, Llu{'i}s", booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)", month = mar, year = "2026", address = "Rabat, Morocco", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2026.eacl-long.53/", doi = "10.18653/v1/2026.eacl-long.53", pages = "1171--1186", ISBN = "979-8-89176-380-7", abstract = "Detecting hyperpartisan narratives and Population Replacement Conspiracy Theories (PRCT) is essential to addressing the spread of misinformation. These complex narratives pose a significant threat, as hyperpartisanship drives political polarisation and institutional distrust, while PRCTs directly motivate real-world extremist violence, making their identification critical for social cohesion and public safety. However, existing resources are scarce, predominantly English-centric, and often analyse hyperpartisanship, stance, and rhetorical bias in isolation rather than as interrelated aspects of political discourse. To bridge this gap, we introduce PartisanLens, the first multilingual dataset of 1617 hyperpartisan news headlines in Spanish, Italian, and Portuguese, annotated in multiple political discourse aspects. We first evaluate the classification performance of widely used Large Language Models (LLMs) on this dataset, establishing robust baselines for the classification of hyperpartisan and PRCT narratives. In addition, we assess the viability of using LLMs as automatic annotators for this task, analysing their ability to approximate human annotation. Results highlight both their potential and current limitations. Next, moving beyond standard judgments, we explore whether LLMs can emulate human annotation patterns by conditioning them on socio-economic and ideological profiles that simulate annotator perspectives. At last, we provide our resources and evaluation; PartisanLens supports future research on detecting partisan and conspiratorial narratives in European contexts." }
If you use this resource, please โญ star the repo and stay tuned for citation info.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle ใซใ‚ˆใฃใฆๅค‰ๆ›ใ•ใ‚ŒใŸใƒšใƒผใ‚ธ (->ใ‚ชใƒชใ‚ธใƒŠใƒซ) /