Name	Name	Last commit message	Last commit date
Latest commit History 18 Commits
baseline	baseline
data	data
predictions/norec	predictions/norec
.DS_Store	.DS_Store
README.md	README.md
Structured-Sentiment-Analysis-IIITD-NLP-PROJECT-.code-workspace	Structured-Sentiment-Analysis-IIITD-NLP-PROJECT-.code-workspace
evaluate.py	evaluate.py
presentation.pdf	presentation.pdf

🧠 Structured Sentiment Analysis

IIIT Delhi · Natural Language Processing Project

A structured approach to opinion mining — extracting who expresses what sentiment toward whom, with a polarity label.

🎯 Problem Statement

Structured Sentiment Analysis (SSA) extends traditional sentiment analysis by extracting complete sentiment graphs from raw text. Instead of simply classifying a text as positive or negative, SSA identifies all the structured opinion tuples present in a sentence.

Formal Definition

Given a text $T = O_1, O_2, \ldots, O_n$, the goal is to predict all opinion tuples of the form:

$$\mathcal{G} = {(h, b, t, p)}$$

Where each element is defined as:

Symbol	Role	Description
$h$	Holder	The entity who expresses the opinion
$b$	Expression	The polar expression (words conveying sentiment)
$t$	Target	The entity toward which the opinion is directed
$p$	Polarity	The sentiment label: `positive`, `negative`, or `neutral`

Illustrative Example

Sentence: "Even though the price is decent for Paris, I would not recommend this hotel."
Opinion Tuple 1:
 Holder → "I"
 Expression → "would not recommend"
 Target → "this hotel"
 Polarity → NEGATIVE
Opinion Tuple 2:
 Holder → (implicit)
 Expression → "decent"
 Target → "the price"
 Polarity → POSITIVE

Unlike standard sentiment analysis, SSA captures complex, overlapping, and multi-target opinions within a single sentence — making it far more expressive and practically useful.

🏗️ Architecture

Our model follows a unified end-to-end architecture built on top of a pretrained BERT-based encoder, enhanced with a CRF decoding layer and a separate polarity head.

 ┌──────────────────────────────────────────┐
 │ Input Text │
 └───────────────────┬──────────────────────┘
 │
 ┌───────────────────▼──────────────────────┐
 │ [CLS] w1 w2 ... wn [SEP] │
 │ Token Embedding Layer │
 └───────────────────┬──────────────────────┘
 │
 ┌───────────────────▼──────────────────────┐
 │ Norwegian BERT Encoder │
 │ (NbAiLab / nb-bert-base) │
 │ Contextual Hidden State Representations │
 └──────────┬────────────────────┬──────────┘
 │ │
 ┌───────────────▼──────┐ ┌─────────▼────────────┐
 │ CRF for BIO Tagging│ │ Polarity Head │
 │ │ │ ([CLS] token) │
 │ B-Source I-Source │ │ │
 │ B-Target I-Target │ │ ┌──────────────┐ │
 │ B-Polar I-Polar │ │ │ Positive │ │
 │ O │ │ │ Negative │ │
 └──────────┬───────────┘ │ │ Neutral │ │
 │ │ └──────────────┘ │
 │ └─────────┬────────────┘
 │ │
 ┌──────────▼─────────────────────────▼──────────┐
 │ Structured Opinion Generation │
 │ │
 │ Merge CRF spans + polarity predictions │
 │ Apply offset mapping & span filtering │
 │ │
 │ Output: (Source, Target, Expression, Polarity)│
 └─────────────────────────────────────────────────┘

Screenshot 2026年02月25日 at 3 19 36 PM

Model Components at a Glance

Component	Description	Technology
Encoder	Contextual representation of input tokens	RoBERTa / BERT (multilingual)
BIO Tagger	Sequence labeling for span identification	Linear Projection
CRF Decoder	Structured decoding with label constraints	Conditional Random Field
Polarity Head	Classifies sentiment of identified spans	Multi-layer Classifier on `[CLS]`
Opinion Builder	Assembles final structured output tuples	Post-processing pipeline

🔬 Methodology

1. Initial Processing & Tokenization

SSA extends basic sentiment analysis by extracting four key components: Holder, Target, Expression, Polarity
We use a pre-trained RoBERTa model (PolitAi Lab/nb-bert-base, [Liu et al., 2019]) as our encoder
The encoder transforms raw input text into rich contextual embeddings for each token
We benchmarked RoBERTa with bert-base and bert-medium — RoBERTa consistently outperforms both

# Encoder produces hidden states for every token position
hidden_states = roberta_encoder(input_ids, attention_mask)
# Shape: [batch_size, seq_len, hidden_dim]

2. Tag Classification Layer (BIO Tagging)

A linear projection layer maps contextual embeddings to BIO tag logits
We use the BIO (Beginning-Inside-Outside) tagging scheme to represent the boundaries of each opinion component

BIO Tag Space

O → Outside any opinion component
B-Source → Beginning of a Holder span
I-Source → Inside a Holder span
B-Target → Beginning of a Target span
I-Target → Inside a Target span
B-Polar → Beginning of a Polar Expression span
I-Polar → Inside a Polar Expression span

Projection Formula

$$Z = W \cdot z + b$$

where $W \in \mathbb{R}^{t \times H}$ is the weight matrix, $z^{t,i} \in \mathbb{R}^H$ represents the hidden state at position $i$, and $t$ is the number of possible BIO tags.

$$\hat{Y} = \mathbb{R}^{N \times t}$$

3. Conditional Random Field (CRF)

After obtaining token-level hidden states from BERT/RoBERTa, a CRF layer is applied to decode BIO tag sequences in a globally-consistent manner.

The CRF learns transition weights between adjacent tags
This enforces structural constraints (e.g., I-Source cannot follow B-Target)
Enables span detection while explicitly modeling tag dependencies

CRF Training Objective

$$\mathcal{L}_{CRF} = -\log\frac{\exp(\text{score}(\text{gold tags}))}{\sum_{\hat{y}} \exp(\text{score}(\hat{y}, \text{ valid tags}))}$$

Key Insight: CRF outperforms a plain softmax decoder because it captures inter-tag dependencies critical for multi-span structured extraction.

4. Polarity Classification

A separate multi-layer classification head takes the [CLS] token representation as input
Predicts one of three sentiment labels:

Sentiment Labels:
 ✅ Positive → Favorable / approving sentiment
 ❌ Negative → Critical / disapproving sentiment
 ➖ Neutral → Factual / non-opinionated expression

The model struggles with ambiguous sentences like:

"The food was great, but the service was terrible."

where polarity is mixed at the span level — a key challenge that motivates future work on span-level polarity.

5. Structured Opinion Generation

In the final stage, we combine CRF-predicted spans with polarity predictions to produce the complete structured opinion quadruple (holder, expression, target, polarity).

Post-processing pipeline:

Convert span boundaries to character offsets using BERT's offset mapping with adjacent token merging
Filter short spans (< 3 characters) to remove noise
Assemble final output as JSON-formatted opinion tuples

⚠️ This filtering step leads to a small loss (~a few percent) of true predictions.

🔁 Baseline System

As a comparison point, we implement a Sequence Labeling + Relation Classification pipeline using BiLSTM models.

Pipeline Overview

Step 1: Span Extraction
 ├── BiLSTM → Extract Holders
 ├── BiLSTM → Extract Targets
 └── BiLSTM → Extract Polar Expressions
Step 2: Relation Prediction
 └── BiLSTM + Max Pooling
 ├── Full text representation
 ├── Holder / Target representation
 └── Expression representation
 └── Concatenate → Linear → Sigmoid
 → Binary: has_relation? (threshold = 0.5)
Step 3: Assemble tuples → (holder, target, expression, polarity)

The baseline uses GloVe / FastText embeddings and trains three separate BiLSTMs, one per annotation type, followed by a relation prediction model.

✨ Features

Feature	Description
🌍 Multilingual Support	Works across Norwegian, English, Spanish, Catalan, and Basque
🏷️ BIO Sequence Labeling	Precise span-level identification using structured tagging
🔗 CRF Decoding	Globally-consistent tag sequence prediction with structural constraints
🎭 Polarity Classification	3-class (Positive / Negative / Neutral) sentiment head
🧩 Quadruple Extraction	Complete `(holder, target, expression, polarity)` output
📊 Weighted F1 Evaluation	Partial-overlap scoring using token-level Jaccard intersection
🔄 Cross-lingual Transfer	Train on English, evaluate on low-resource target languages
📦 Modular Architecture	Encoder, CRF, and classifier heads are independently configurable

📦 Datasets

Subtask 1 — Monolingual

Dataset	Language	Domain
`norec`	Norwegian	Professional reviews (multi-domain)
`opener_en`	English	Hotel reviews
`opener_es`	Spanish	Hotel reviews
`multibooked_ca`	Catalan	Hotel reviews
`multibooked_eu`	Basque	Hotel reviews
`darmstadt_unis`	English	University reviews (online)
`mpqa`	English	News (opinion annotations)

Subtask 2 — Cross-Lingual

Trained on high-resource English data, evaluated on:

Test Dataset	Language
`opener_es`	Spanish
`multibooked_ca`	Catalan
`multibooked_eu`	Basque

Data Format

Each dataset is in JSON format with the following schema:

{
 "sent_id": "opener/en/hotel/english00164-6",
 "text": "Even though the price is decent for Paris, I would not recommend this hotel.",
 "opinions": [
 {
 "Source": [["I"], ["44:45"]],
 "Target": [["this hotel"], ["66:76"]],
 "Polar_expression": [["would not recommend"], ["46:65"]],
 "Polarity": "negative",
 "Intensity": "average"
 }
 ]
}

📊 Results

Monolingual Performance

Average across all 7 datasets

Metric	Score
SF1 (Sentiment F1)	0.46
SP (Sentiment Precision)	0.62
SR (Sentiment Recall)	0.65

Per-dataset Breakdown:

Dataset	SF1	SP	SR
Opener_en	0.41	0.37	0.47
Opener_es	0.35	0.33	0.38
NoReC	0.23	0.30	0.18
Multibooked_ca	0.57	0.53	0.63
Multibooked_eu	0.53	0.40	0.71
darmstadt_unis	0.55	0.58	0.00
MPQA	0.52	0.55	0.00

Cross-Lingual Performance

Average across 3 target language datasets

Metric	Score
SF1 (Sentiment F1)	0.35
SP (Sentiment Precision)	0.85
SR (Sentiment Recall)	0.63

Per-dataset Breakdown:

Dataset	SF1	Precision	Recall
Opener_es	0.000	0.000	0.000
Multibooked_ca	0.481	0.461	0.503
Multibooked_eu	0.671	0.618	0.733

🔍 Key Observations

🟢 BERT significantly improves span extraction results over CRF baseline alone — especially for English and language-similar corpora
🟡 Multibooked_eu performs best in cross-lingual settings — likely due to the smaller dataset size and consistent hotel-review characteristics
🔴 Complex/ambiguous expressions (different polarity in different contexts) present a challenge across all datasets
📌 Character-level BERT representations outperform word-level representations as a comparison baseline

🚀 Installation

Prerequisites

Python ≥ 3.8
PyTorch ≥ 1.9
CUDA (optional, for GPU acceleration)

1. Clone the Repository

git clone https://github.com/your-username/Structured-Sentiment-Analysis-IIITD-NLP-PROJECT.git
cd Structured-Sentiment-Analysis-IIITD-NLP-PROJECT

2. Install Core Dependencies

pip install torch torchvision transformers
pip install nltk scikit-learn tqdm gensim

3. Install Baseline Dependencies

pip install -r baseline/sequence_labeling/requirements.txt

4. Install Data Processing Dependencies

pip install -r data/requirements.txt

5. Prepare External Datasets

MPQA 2.0 — Download from the MPQA website and run:

cd data/mpqa && bash process_mpqa.sh

Darmstadt Service Review Corpus — Download from TU Darmstadt and run:

cd data/darmstadt_unis && bash process_darmstadt.sh

🛠️ Usage

Evaluation

Run the official evaluation script on model predictions:

python evaluate.py <input_dir> <output_dir>

Where:

<input_dir>/res/ contains your predictions.json per dataset
<input_dir>/ref/data/ contains the gold test files

Baseline — Training

# Train all BiLSTM baseline models across datasets
cd baseline/sequence_labeling
bash get_baselines.sh

Baseline — Inference

# Run inference on a specific dataset and split
python baseline/sequence_labeling/inference.py \
 --DATADIR opener_en \
 --FILE dev.json

Output will be saved to:

baseline/sequence_labeling/saved_models/relation_prediction/<DATADIR>/prediction.json

Predictions Format

Prediction files must match the gold data format. Each entry should look like:

{
 "sent_id": "unique-sentence-id",
 "text": "Raw input sentence here.",
 "opinions": [
 {
 "Source": [["holder text"], ["start:end"]],
 "Target": [["target text"], ["start:end"]],
 "Polar_expression": [["expression text"], ["start:end"]],
 "Polarity": "positive"
 }
 ]
}

📁 Project Structure

Structured-Sentiment-Analysis-IIITD-NLP-PROJECT/
│
├── 📄 evaluate.py ← Official evaluation script (SF1 / SP / SR)
│
├── 📁 baseline/
│ └── sequence_labeling/
│ ├── extraction_module.py ← BiLSTM span extractor (Holder / Target / Expr)
│ ├── relation_prediction_module.py ← BiLSTM relation classifier
│ ├── inference.py ← End-to-end inference pipeline
│ ├── convert_to_bio.py ← Convert JSON data to BIO format
│ ├── convert_to_rels.py ← Convert predictions to relation pairs
│ ├── utils.py ← Data loading & vocabulary utilities
│ ├── WordVecs.py ← Pretrained word embedding loader
│ ├── get_baselines.sh ← Script to train all baseline models
│ └── requirements.txt
│
├── 📁 data/
│ ├── norec/ ← Norwegian multi-domain reviews
│ │ ├── train.json
│ │ ├── dev.json
│ │ └── test.json
│ ├── opener_en/ ← English hotel reviews
│ ├── opener_es/ ← Spanish hotel reviews
│ ├── multibooked_ca/ ← Catalan hotel reviews
│ ├── multibooked_eu/ ← Basque hotel reviews
│ ├── mpqa/ ← MPQA news corpus
│ ├── darmstadt_unis/ ← English university reviews
│ └── README.md ← Data format documentation
│
└── 📁 predictions/
 └── norec/
 └── predictions.json ← Sample model predictions

🔭 Further Work

We identify several promising directions to build upon this work:

🕸️ Dependency Graph Parsing

In the future, we would likely move to a dependency graph parsing approach for structured sentiment — augmenting the token-level representation with their heads in a dependency tree (Kurtz et al., 2020). This allows richer relational reasoning between opinion components.

🌐 Multi-task & Cross-lingual Learning

Exploring multi-task learning across languages to better leverage shared structure in multilingual sentiment graphs
Joint training on all monolingual datasets with language-specific adapters

🔗 Advanced Graph Parsers

Point Network (Samuel & Straka, 2020) — a strong graph parser for SSA
PERIN — a permutation-invariant structured prediction framework

📐 Span-level Polarity

Currently, polarity is predicted globally per sentence via the [CLS] token
Moving to span-level polarity prediction would handle cases like "The food was great, but the service was terrible"

🤖 Serialized Large Language Models

Explore whether large pre-trained models (e.g., GPT-4, LLaMA) can directly predict structured opinion tuples via in-context learning or fine-tuning, without an explicit CRF layer

📚 References

Barnes, J. et al. (2021). SemEval-2022 Task 10: Structured Sentiment Analysis.
 Proceedings of the 16th Workshop on Semantic Evaluation.
Liu, Y. et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach.
 arXiv:1907.11692.
Kurtz, et al. (2020). Improving Low-Resource NMT through Relevance Based Linguistic Features.
 ACL 2020.
Samuel, D. & Straka, M. (2020). ÚFAL at MRP 2020: Permutation-Invariant Semantic Parsing.
 CoNLL 2020 Shared Task.

Made with ❤️ at IIIT Delhi | NLP Course Project

For questions or contributions, please open an issue or pull request.

Folders and files

Latest commit

History

Repository files navigation

🧠 Structured Sentiment Analysis

IIIT Delhi · Natural Language Processing Project

📋 Table of Contents

🎯 Problem Statement

Formal Definition

Illustrative Example

🏗️ Architecture

Model Components at a Glance

🔬 Methodology

1. Initial Processing & Tokenization

2. Tag Classification Layer (BIO Tagging)

BIO Tag Space

Projection Formula

3. Conditional Random Field (CRF)

CRF Training Objective

4. Polarity Classification

5. Structured Opinion Generation

🔁 Baseline System

Pipeline Overview

✨ Features

📦 Datasets

Subtask 1 — Monolingual

Subtask 2 — Cross-Lingual

Data Format

📊 Results

Monolingual Performance

Cross-Lingual Performance

🔍 Key Observations

🚀 Installation

Prerequisites

1. Clone the Repository

2. Install Core Dependencies

3. Install Baseline Dependencies

4. Install Data Processing Dependencies

5. Prepare External Datasets

🛠️ Usage

Evaluation

Baseline — Training

Baseline — Inference

Predictions Format

📁 Project Structure

🔭 Further Work

🕸️ Dependency Graph Parsing

🌐 Multi-task & Cross-lingual Learning

🔗 Advanced Graph Parsers

📐 Span-level Polarity

🤖 Serialized Large Language Models

📚 References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages