Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rahulsjha/Structured-Sentiment-Analysis

Repository files navigation

🧠 Structured Sentiment Analysis

IIIT Delhi Β· Natural Language Processing Project

Python PyTorch HuggingFace License: MIT

A structured approach to opinion mining β€” extracting who expresses what sentiment toward whom, with a polarity label.


πŸ“‹ Table of Contents


🎯 Problem Statement

Structured Sentiment Analysis (SSA) extends traditional sentiment analysis by extracting complete sentiment graphs from raw text. Instead of simply classifying a text as positive or negative, SSA identifies all the structured opinion tuples present in a sentence.

Formal Definition

Given a text $T = O_1, O_2, \ldots, O_n$, the goal is to predict all opinion tuples of the form:

$$\mathcal{G} = {(h, b, t, p)}$$

Where each element is defined as:

Symbol Role Description
$h$ Holder The entity who expresses the opinion
$b$ Expression The polar expression (words conveying sentiment)
$t$ Target The entity toward which the opinion is directed
$p$ Polarity The sentiment label: positive, negative, or neutral

Illustrative Example

Sentence: "Even though the price is decent for Paris, I would not recommend this hotel."
Opinion Tuple 1:
 Holder β†’ "I"
 Expression β†’ "would not recommend"
 Target β†’ "this hotel"
 Polarity β†’ NEGATIVE
Opinion Tuple 2:
 Holder β†’ (implicit)
 Expression β†’ "decent"
 Target β†’ "the price"
 Polarity β†’ POSITIVE

Unlike standard sentiment analysis, SSA captures complex, overlapping, and multi-target opinions within a single sentence β€” making it far more expressive and practically useful.


πŸ—οΈ Architecture

Our model follows a unified end-to-end architecture built on top of a pretrained BERT-based encoder, enhanced with a CRF decoding layer and a separate polarity head.

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ Input Text β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ [CLS] w1 w2 ... wn [SEP] β”‚
 β”‚ Token Embedding Layer β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ Norwegian BERT Encoder β”‚
 β”‚ (NbAiLab / nb-bert-base) β”‚
 β”‚ Contextual Hidden State Representations β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚ β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ CRF for BIO Taggingβ”‚ β”‚ Polarity Head β”‚
 β”‚ β”‚ β”‚ ([CLS] token) β”‚
 β”‚ B-Source I-Source β”‚ β”‚ β”‚
 β”‚ B-Target I-Target β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
 β”‚ B-Polar I-Polar β”‚ β”‚ β”‚ Positive β”‚ β”‚
 β”‚ O β”‚ β”‚ β”‚ Negative β”‚ β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Neutral β”‚ β”‚
 β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚ β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ Structured Opinion Generation β”‚
 β”‚ β”‚
 β”‚ Merge CRF spans + polarity predictions β”‚
 β”‚ Apply offset mapping & span filtering β”‚
 β”‚ β”‚
 β”‚ Output: (Source, Target, Expression, Polarity)β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Screenshot 2026εΉ΄02月25ζ—₯ at 3 19 36 PM

Model Components at a Glance

Component Description Technology
Encoder Contextual representation of input tokens RoBERTa / BERT (multilingual)
BIO Tagger Sequence labeling for span identification Linear Projection
CRF Decoder Structured decoding with label constraints Conditional Random Field
Polarity Head Classifies sentiment of identified spans Multi-layer Classifier on [CLS]
Opinion Builder Assembles final structured output tuples Post-processing pipeline

πŸ”¬ Methodology

1. Initial Processing & Tokenization

  • SSA extends basic sentiment analysis by extracting four key components: Holder, Target, Expression, Polarity
  • We use a pre-trained RoBERTa model (PolitAi Lab/nb-bert-base, [Liu et al., 2019]) as our encoder
  • The encoder transforms raw input text into rich contextual embeddings for each token
  • We benchmarked RoBERTa with bert-base and bert-medium β€” RoBERTa consistently outperforms both
# Encoder produces hidden states for every token position
hidden_states = roberta_encoder(input_ids, attention_mask)
# Shape: [batch_size, seq_len, hidden_dim]

2. Tag Classification Layer (BIO Tagging)

  • A linear projection layer maps contextual embeddings to BIO tag logits
  • We use the BIO (Beginning-Inside-Outside) tagging scheme to represent the boundaries of each opinion component

BIO Tag Space

O β†’ Outside any opinion component
B-Source β†’ Beginning of a Holder span
I-Source β†’ Inside a Holder span
B-Target β†’ Beginning of a Target span
I-Target β†’ Inside a Target span
B-Polar β†’ Beginning of a Polar Expression span
I-Polar β†’ Inside a Polar Expression span

Projection Formula

$$Z = W \cdot z + b$$

where $W \in \mathbb{R}^{t \times H}$ is the weight matrix, $z^{t,i} \in \mathbb{R}^H$ represents the hidden state at position $i$, and $t$ is the number of possible BIO tags.

$$\hat{Y} = \mathbb{R}^{N \times t}$$


3. Conditional Random Field (CRF)

After obtaining token-level hidden states from BERT/RoBERTa, a CRF layer is applied to decode BIO tag sequences in a globally-consistent manner.

  • The CRF learns transition weights between adjacent tags
  • This enforces structural constraints (e.g., I-Source cannot follow B-Target)
  • Enables span detection while explicitly modeling tag dependencies

CRF Training Objective

$$\mathcal{L}_{CRF} = -\log\frac{\exp(\text{score}(\text{gold tags}))}{\sum_{\hat{y}} \exp(\text{score}(\hat{y}, \text{ valid tags}))}$$

Key Insight: CRF outperforms a plain softmax decoder because it captures inter-tag dependencies critical for multi-span structured extraction.


4. Polarity Classification

  • A separate multi-layer classification head takes the [CLS] token representation as input
  • Predicts one of three sentiment labels:
Sentiment Labels:
 βœ… Positive β†’ Favorable / approving sentiment
 ❌ Negative β†’ Critical / disapproving sentiment
 βž– Neutral β†’ Factual / non-opinionated expression
  • The model struggles with ambiguous sentences like:

    "The food was great, but the service was terrible."

    where polarity is mixed at the span level β€” a key challenge that motivates future work on span-level polarity.


5. Structured Opinion Generation

In the final stage, we combine CRF-predicted spans with polarity predictions to produce the complete structured opinion quadruple (holder, expression, target, polarity).

Post-processing pipeline:

  1. Convert span boundaries to character offsets using BERT's offset mapping with adjacent token merging
  2. Filter short spans (< 3 characters) to remove noise
  3. Assemble final output as JSON-formatted opinion tuples

⚠️ This filtering step leads to a small loss (~a few percent) of true predictions.


πŸ” Baseline System

As a comparison point, we implement a Sequence Labeling + Relation Classification pipeline using BiLSTM models.

Pipeline Overview

Step 1: Span Extraction
 β”œβ”€β”€ BiLSTM β†’ Extract Holders
 β”œβ”€β”€ BiLSTM β†’ Extract Targets
 └── BiLSTM β†’ Extract Polar Expressions
Step 2: Relation Prediction
 └── BiLSTM + Max Pooling
 β”œβ”€β”€ Full text representation
 β”œβ”€β”€ Holder / Target representation
 └── Expression representation
 └── Concatenate β†’ Linear β†’ Sigmoid
 β†’ Binary: has_relation? (threshold = 0.5)
Step 3: Assemble tuples β†’ (holder, target, expression, polarity)

The baseline uses GloVe / FastText embeddings and trains three separate BiLSTMs, one per annotation type, followed by a relation prediction model.


✨ Features

Feature Description
🌍 Multilingual Support Works across Norwegian, English, Spanish, Catalan, and Basque
🏷️ BIO Sequence Labeling Precise span-level identification using structured tagging
πŸ”— CRF Decoding Globally-consistent tag sequence prediction with structural constraints
🎭 Polarity Classification 3-class (Positive / Negative / Neutral) sentiment head
🧩 Quadruple Extraction Complete (holder, target, expression, polarity) output
πŸ“Š Weighted F1 Evaluation Partial-overlap scoring using token-level Jaccard intersection
πŸ”„ Cross-lingual Transfer Train on English, evaluate on low-resource target languages
πŸ“¦ Modular Architecture Encoder, CRF, and classifier heads are independently configurable

πŸ“¦ Datasets

Subtask 1 β€” Monolingual

Dataset Language Domain
norec Norwegian Professional reviews (multi-domain)
opener_en English Hotel reviews
opener_es Spanish Hotel reviews
multibooked_ca Catalan Hotel reviews
multibooked_eu Basque Hotel reviews
darmstadt_unis English University reviews (online)
mpqa English News (opinion annotations)

Subtask 2 β€” Cross-Lingual

Trained on high-resource English data, evaluated on:

Test Dataset Language
opener_es Spanish
multibooked_ca Catalan
multibooked_eu Basque

Data Format

Each dataset is in JSON format with the following schema:

{
 "sent_id": "opener/en/hotel/english00164-6",
 "text": "Even though the price is decent for Paris, I would not recommend this hotel.",
 "opinions": [
 {
 "Source": [["I"], ["44:45"]],
 "Target": [["this hotel"], ["66:76"]],
 "Polar_expression": [["would not recommend"], ["46:65"]],
 "Polarity": "negative",
 "Intensity": "average"
 }
 ]
}

πŸ“Š Results

Monolingual Performance

Average across all 7 datasets

Metric Score
SF1 (Sentiment F1) 0.46
SP (Sentiment Precision) 0.62
SR (Sentiment Recall) 0.65

Per-dataset Breakdown:

Dataset SF1 SP SR
Opener_en 0.41 0.37 0.47
Opener_es 0.35 0.33 0.38
NoReC 0.23 0.30 0.18
Multibooked_ca 0.57 0.53 0.63
Multibooked_eu 0.53 0.40 0.71
darmstadt_unis 0.55 0.58 0.00
MPQA 0.52 0.55 0.00

Cross-Lingual Performance

Average across 3 target language datasets

Metric Score
SF1 (Sentiment F1) 0.35
SP (Sentiment Precision) 0.85
SR (Sentiment Recall) 0.63

Per-dataset Breakdown:

Dataset SF1 Precision Recall
Opener_es 0.000 0.000 0.000
Multibooked_ca 0.481 0.461 0.503
Multibooked_eu 0.671 0.618 0.733

πŸ” Key Observations

  • 🟒 BERT significantly improves span extraction results over CRF baseline alone β€” especially for English and language-similar corpora
  • 🟑 Multibooked_eu performs best in cross-lingual settings β€” likely due to the smaller dataset size and consistent hotel-review characteristics
  • πŸ”΄ Complex/ambiguous expressions (different polarity in different contexts) present a challenge across all datasets
  • πŸ“Œ Character-level BERT representations outperform word-level representations as a comparison baseline

πŸš€ Installation

Prerequisites

  • Python β‰₯ 3.8
  • PyTorch β‰₯ 1.9
  • CUDA (optional, for GPU acceleration)

1. Clone the Repository

git clone https://github.com/your-username/Structured-Sentiment-Analysis-IIITD-NLP-PROJECT.git
cd Structured-Sentiment-Analysis-IIITD-NLP-PROJECT

2. Install Core Dependencies

pip install torch torchvision transformers
pip install nltk scikit-learn tqdm gensim

3. Install Baseline Dependencies

pip install -r baseline/sequence_labeling/requirements.txt

4. Install Data Processing Dependencies

pip install -r data/requirements.txt

5. Prepare External Datasets

MPQA 2.0 β€” Download from the MPQA website and run:

cd data/mpqa && bash process_mpqa.sh

Darmstadt Service Review Corpus β€” Download from TU Darmstadt and run:

cd data/darmstadt_unis && bash process_darmstadt.sh

πŸ› οΈ Usage

Evaluation

Run the official evaluation script on model predictions:

python evaluate.py <input_dir> <output_dir>

Where:

  • <input_dir>/res/ contains your predictions.json per dataset
  • <input_dir>/ref/data/ contains the gold test files

Baseline β€” Training

# Train all BiLSTM baseline models across datasets
cd baseline/sequence_labeling
bash get_baselines.sh

Baseline β€” Inference

# Run inference on a specific dataset and split
python baseline/sequence_labeling/inference.py \
 --DATADIR opener_en \
 --FILE dev.json

Output will be saved to:

baseline/sequence_labeling/saved_models/relation_prediction/<DATADIR>/prediction.json

Predictions Format

Prediction files must match the gold data format. Each entry should look like:

{
 "sent_id": "unique-sentence-id",
 "text": "Raw input sentence here.",
 "opinions": [
 {
 "Source": [["holder text"], ["start:end"]],
 "Target": [["target text"], ["start:end"]],
 "Polar_expression": [["expression text"], ["start:end"]],
 "Polarity": "positive"
 }
 ]
}

πŸ“ Project Structure

Structured-Sentiment-Analysis-IIITD-NLP-PROJECT/
β”‚
β”œβ”€β”€ πŸ“„ evaluate.py ← Official evaluation script (SF1 / SP / SR)
β”‚
β”œβ”€β”€ πŸ“ baseline/
β”‚ └── sequence_labeling/
β”‚ β”œβ”€β”€ extraction_module.py ← BiLSTM span extractor (Holder / Target / Expr)
β”‚ β”œβ”€β”€ relation_prediction_module.py ← BiLSTM relation classifier
β”‚ β”œβ”€β”€ inference.py ← End-to-end inference pipeline
β”‚ β”œβ”€β”€ convert_to_bio.py ← Convert JSON data to BIO format
β”‚ β”œβ”€β”€ convert_to_rels.py ← Convert predictions to relation pairs
β”‚ β”œβ”€β”€ utils.py ← Data loading & vocabulary utilities
β”‚ β”œβ”€β”€ WordVecs.py ← Pretrained word embedding loader
β”‚ β”œβ”€β”€ get_baselines.sh ← Script to train all baseline models
β”‚ └── requirements.txt
β”‚
β”œβ”€β”€ πŸ“ data/
β”‚ β”œβ”€β”€ norec/ ← Norwegian multi-domain reviews
β”‚ β”‚ β”œβ”€β”€ train.json
β”‚ β”‚ β”œβ”€β”€ dev.json
β”‚ β”‚ └── test.json
β”‚ β”œβ”€β”€ opener_en/ ← English hotel reviews
β”‚ β”œβ”€β”€ opener_es/ ← Spanish hotel reviews
β”‚ β”œβ”€β”€ multibooked_ca/ ← Catalan hotel reviews
β”‚ β”œβ”€β”€ multibooked_eu/ ← Basque hotel reviews
β”‚ β”œβ”€β”€ mpqa/ ← MPQA news corpus
β”‚ β”œβ”€β”€ darmstadt_unis/ ← English university reviews
β”‚ └── README.md ← Data format documentation
β”‚
└── πŸ“ predictions/
 └── norec/
 └── predictions.json ← Sample model predictions

πŸ”­ Further Work

We identify several promising directions to build upon this work:

πŸ•ΈοΈ Dependency Graph Parsing

In the future, we would likely move to a dependency graph parsing approach for structured sentiment β€” augmenting the token-level representation with their heads in a dependency tree (Kurtz et al., 2020). This allows richer relational reasoning between opinion components.

🌐 Multi-task & Cross-lingual Learning

  • Exploring multi-task learning across languages to better leverage shared structure in multilingual sentiment graphs
  • Joint training on all monolingual datasets with language-specific adapters

πŸ”— Advanced Graph Parsers

  • Point Network (Samuel & Straka, 2020) β€” a strong graph parser for SSA
  • PERIN β€” a permutation-invariant structured prediction framework

πŸ“ Span-level Polarity

  • Currently, polarity is predicted globally per sentence via the [CLS] token
  • Moving to span-level polarity prediction would handle cases like "The food was great, but the service was terrible"

πŸ€– Serialized Large Language Models

  • Explore whether large pre-trained models (e.g., GPT-4, LLaMA) can directly predict structured opinion tuples via in-context learning or fine-tuning, without an explicit CRF layer

πŸ“š References

Barnes, J. et al. (2021). SemEval-2022 Task 10: Structured Sentiment Analysis.
 Proceedings of the 16th Workshop on Semantic Evaluation.
Liu, Y. et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach.
 arXiv:1907.11692.
Kurtz, et al. (2020). Improving Low-Resource NMT through Relevance Based Linguistic Features.
 ACL 2020.
Samuel, D. & Straka, M. (2020). ÚFAL at MRP 2020: Permutation-Invariant Semantic Parsing.
 CoNLL 2020 Shared Task.

Made with ❀️ at IIIT Delhi | NLP Course Project

For questions or contributions, please open an issue or pull request.

About

Built a cross-lingual sentiment analysis model over 8+ languages for monolingual and cross-lingual tasks, achieving Sentiment Graph F1 of 0.55 (cross-lingual) and 0.559 (monolingual) using RoBERTa with CRF for span extraction and polarity classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /