Data

LRG data

LRG data is available on the LRG FTP site in XML format.
The LRG genomic, transcript and protein sequences are also available in FASTA format on the LRG FTP site.

There is the possibility to download all the public and pending LRGs:
Status Formats (zipped)
Public
XML
FASTA
Pending
XML
FASTA
If you can't access to the LRG FTP site through the FTP protocol, you can access it using the HTTP protocol: LRG FTP site - HTTP.
Summary data
Data type(s) Description Format Files location by assembly
GRCh37 GRCh38
LRG genes
LRG transcripts
The file contains 4 tracks:
  • Public LRG genes
  • Public LRG transcripts, with their exon(s) coordinates
  • Pending LRG genes
  • Pending LRG transcripts, with their exon(s) coordinates
BED
(12 columns)
BED
BED
LRG genes
with genomic coordinates
The file lists the LRG genes in genomic coordinates. The columns are:
  • LRG identifier
  • HGNC symbol
  • Status (public/pending)
  • Chromosome
  • Start
  • End
  • Strand (1 = forward, -1 = reverse)
File content example
LRG_IDHGNC_SYMBOLLRG_STATUSCHROMOSOMESTARTSTOPSTRAND
LRG_1COL1A1public175018209650206639-1
LRG_2COL1A2public794389561944332321
Tabulated
TXT
TXT
LRG transcripts
with exons coordinates
The file lists the LRG transcripts, exons and protein in genomic coordinates
The columns are:
  • LRG transcript identifier
  • HGNC symbol
  • Chromosome
  • Strand (1 = forward, -1 = reverse)
  • Transcript start
  • Transcript end
  • List of exons coordinates ("start-end" separated by a comma)
  • LRG protein identifier
  • Protein start
  • Protein end
File content example
LRG_TRANSCRIPTHGNC_SYMBOLCHROMOSOMESTRANDTRANSCRIPT_STARTTRANSCRIPT_STOPEXONS_COORDSLRG_PROTEINCDS_STARTCDS_STOP
LRG_1t1COL1A117-15018409650201639
  • 50184096-50185648
  • 50185778-50186020
  • ...
  • 50201411-50201639
LRG_1p15018550250201513
LRG_2t1COL1A2719439456194431232
  • 94394561-94395101
  • 94397748-94397758
  • ...
  • 94430247-94431232
LRG_2p19439503294430393
Tabulated
TXT
TXT
LRG transcripts
with external references
The file lists the LRG transcripts and their external references
The columns are:
  • LRG ID
  • HGNC symbol
  • RefSeqGene ID
  • LRG transcript
  • RefSeq transcript ID with the sequence identical to the LRG transcript
  • Ensembl transcript ID with the sequence identical to LRG transcript
  • CCDS ID
File content example
LRGHGNC_SYMBOLREFSEQ_GENOMICLRG_TRANSCRIPTREFSEQ_TRANSCRIPTENSEMBL_TRANSCRIPTCCDS
LRG_1COL1A1NG_007400.1t1NM_000088.3-CCDS11561.1
LRG_2COL1A2NG_007405.1t1NM_000089.3ENST00000297268.10CCDS34682.1
Tabulated -
TXT
LRG proteins
with RefSeq proteins
The file lists the LRG proteins and their corresponding RefSeq proteins and transcripts
The columns are:
  • LRG protein ID
  • RefSeq Protein ID
  • LRG ID
  • LRG transcript
  • RefSeq Transcript ID
File content example
LRG_PROTEINREFSEQ_PROTEINLRGLRG_TRANSCRIPTREFSEQ_TRANSCRIPT
LRG_1p1NP_000079.2LRG_1LRG_1t1NM_000088.3
LRG_2p1NP_000080.2LRG_2LRG_2t1NM_000089.3
Tabulated -
TXT

LRG in Ensembl

The list of LRGs already imported in Ensembl is available in this text file:


LRG XML schema

The LRG XML schema documentation is downloadable here and the different versions of the XML schema definitions (RELAX NG format) are available here.
The current LRG XML schema version is 1.10.

LRG archived data

Previous versions of the LRGs (in different LRG XML schemas) are available:
Schema version Link
Schema 1.8 FTP site
Schema 1.7 FTP site
Schema 1.6 FTP site

Web services

The EMBL-EBI provides RESTful web services for LRG.