Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

dhlab-epfl/LinkedBooksDeepReferenceParsing

Repository files navigation

Deep Reference Parsing

This repository contains the code for the following article:

@article{alves_deep_2018,
 author = {{Rodrigues Alves, Danny and Giovanni Colavizza and Frédéric Kaplan}},
 title = {{Deep Reference Mining from Scholarly Literature in the Arts and Humanities}},
 journal = {{Frontiers in Research Metrics & Analytics}},
 volume = 3,
 number = 21,
 year = 2018,
 doi = {10.3389/frma.2018.00021}
 }

Task definition

We focus on the task of reference mining, instantiated into three tasks: reference components detection (task 1), reference typology detection (task 2) and reference span detection (task 3).

  • Sequence: G. Ostrogorsky, History of the Byzantine State, Rutgers University Press, 1986.
  • Task 1: author author title title title title title publisher publisher publisher year
  • Task 2: b-secondary i-secondary ... e-secondary
  • Task 3: b-r i-r ... e-r

Contents

  • LICENSE MIT.
  • README.md this file.
  • dataset/
    • train Train split, CoNLL format.
    • test Test split, CoNLL format.
    • validation Validation split, CoNLL format.
  • compressed dataset Compressed dataset.
  • data facts a Python notebook to explore the dataset (number of references, tag distributions).
  • crf_baseline CRF baseline implementation details.
  • keras Keras implementation details.
  • tensorflow TF implementation details.

Dataset

Example of dataset entry (beginning of validation dataset, first line/sequence): Token Task1tag Task2tag Task3tag`:

-DOCSTART- -X- -X- o
C author b-secondary b-r
. author i-secondary i-r
Agnoletti author i-secondary i-r
, author i-secondary i-r
Treviso title i-secondary i-r
e title i-secondary i-r
le title i-secondary i-r
sue title i-secondary i-r
pievi title i-secondary i-r
. title i-secondary i-r
Illustrazione title i-secondary i-r
storica title i-secondary i-r
, title i-secondary i-r
Treviso publicationplace i-secondary i-r
1898 year i-secondary i-r
, year i-secondary i-r
2 publicationspecifications i-secondary i-r
v publicationspecifications e-secondary i-r
. publicationspecifications e-secondary e-r

Pre-trained word vectors can be downloaded from Zenodo: DOI

Implementations

CRF baseline

See internal readme for details.

Keras

See internal readme for details.

Tensor Flow

See internal readme for details.

This implementation borrows from Guillaume Genthial's Sequence Tagging with Tensorflow.

Releases

No releases published

Packages

No packages published

Contributors 2

AltStyle によって変換されたページ (->オリジナル) /