wwpdb

Main Index

TITLE

SPLIT (added)

CAVEAT

Title Section

This section contains records used to describe the experiment and the biological macromolecules present in the entry: HEADER, OBSLTE, TITLE, SPLIT, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, and REMARK records.

HEADER

Overview

The HEADER record uniquely identifies a PDB entry through the idCode field. This record also provides a classification for the entry. Finally, it contains the date when the coordinates were deposited to the PDB archive.

Record Format

COLUMNS    DATA TYPE   FIELD       DEFINITION
------------------------------------------------------------------------------------
 1 - 6    Record name  "HEADER"
11 - 50    String(40)  classification  Classifies the molecule(s).
51 - 59    Date     depDate      Deposition date. This is the date the
 coordinates were received at the PDB.
63 - 66    IDcode    idCode      This identifier is unique within the PDB.

Details

The classification string is left-justified and exactly matches one of a collection of strings. A class list is available from the current wwPDB Annotation Documentation Appendices (http://www.wwpdb.org/docs.html). In the case of macromolecular complexes, the classification field must present a class for each macromolecule present. Due to the limited length of the classification field, strings must sometimes be abbreviated. In these cases, the full terms are given in KEYWDS.
Classification may be based on function, metabolic role, molecule type, cellular location, etc. This record can describe dual functions of a molecules, and when applicable, separated by a comma ",". Entries with multiple molecules in a complex will list the classifications of each macromolecule separated by slash "/".

Verification/Validation/Value Authority Control

The verification program checks that the deposition date is a legitimate date and that the ID code is well-formed.

PDB coordinate entry ID codes do not begin with 0. "No coordinates", or NOC files, given as 0xxx codes, contained no structural information and were bibliographic only. These entries were subsequently removed from PDB archive.

Relationships to Other Record Types

The classification found in HEADER also appears in KEYWDS, unabbreviated and in no strict order.

Example

 1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
HEADER  PHOTOSYNTHESIS             28-MAR-07  2UXK       
HEADER  TRANSFERASE/TRANSFERASE INHIBITOR    17-SEP-04  1XH6       
HEADER  MEMBRANE PROTEIN, TRANSPORT PROTEIN   20-JUL-06  2HRT

OBSLTE

Overview

OBSLTE appears in entries that have been removed from public distribution.

This record acts as a flag in an entry that has been removed ("obsoleted") from the PDB's full release. It indicates which, if any, new entries have replaced the entry that was obsoleted. The format allows for the case of multiple new entries replacing one existing entry.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION
---------------------------------------------------------------------------------------
 1 - 6    Record name  "OBSLTE"
 9 - 10    Continuation continuation Allows concatenation of multiple records
12 - 20    Date     repDate    Date that this entry was replaced.
22 - 25    IDcode    idCode    ID code of this entry.
32 - 35    IDcode    rIdCode    ID code of entry that replaced this one.
37 - 40    IDcode    rIdCode    ID code of entry that replaced this one.
42 - 45    IDcode    rIdCode    ID code of entry that replaced this one.
47 - 50    IDcode    rIdCode    ID code of entry that replaced this one.
52 - 55    IDcode    rIdCode    ID code of entry that replaced this one.
57 - 60    IDcode    rIdCode    ID code of entry that replaced this one.
62 - 65    IDcode    rIdCode    ID code of entry that replaced this one.
67 - 70    IDcode    rIdCode    ID code of entry that replaced this one.
72 - 75 IDcode rIdCode ID code of entry that replaced this one.

Details

It is PDB policy that only the principal investigator and/or the primary author who submitted an entry has the authority to obsolete it. All OBSLTE entries are available from the PDB archive (https://files.wwpdb.org/pub/pdb/data/structures/obsolete).
Though the obsolete entry is removed from the public archive, the initial citation that reported the structure is carried over to the superseding entry.

Verification/Validation/Value Authority Control

wwPDB staff adds this record at the time an entry is removed from release.

Relationships to Other Record Types

None.

Example

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
OBSLTE   31-JAN-94 1MBP   2MBP

TITLE

Overview

The TITLE record contains a title for the experiment or analysis that is represented in the entry.
It should identify an entry in the same way that a citation title identifies a publication.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION
----------------------------------------------------------------------------------
 1 - 6    Record name  "TITLE "
 9 - 10    Continuation continuation Allows concatenation of multiple records.
11 - 80    String    title     Title of the experiment.

Details

The title of the entry is free text and should describe the contents of the entry and any procedures or conditions that distinguish this entry from similar entries. It presents an opportunity for the depositor to emphasize the underlying purpose of this particular experiment.
Some items that may be included in TITLE are:

- Experiment type.
- Description of the mutation.
- The fact that only alpha carbon coordinates have been provided in the entry.

Verification/Validation/Value Authority Control

This record is free text so no verification of format is required. The title is supplied by the depositor, but staff may exercise editorial judgment in consultation with depositors in
assigning the title.

Relationships to Other Record Types

COMPND, SOURCE, EXPDTA, and REMARKs provide information that may also be found in TITLE. You may think of the title as describing the experiment, and the compound record as describing the molecule(s).

Examples

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
TITLE   RHIZOPUSPEPSIN COMPLEXED WITH REDUCED PEPTIDE INHIBITOR
TITLE   STRUCTURE OF THE TRANSFORMED MONOCLINIC LYSOZYME BY          
TITLE  2 CONTROLLED DEHYDRATION  
TITLE   NMR STUDY OF OXIDIZED THIOREDOXIN MUTANT (C62A,C69A,C73A)
TITLE  2 MINIMIZED AVERAGE STRUCTURE

SPLIT (added)

Overview

The SPLIT record is used in instances where a specific entry composes part of a large macromolecular complex. It will identify the PDB entries that are required to reconstitute a complete complex.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION
----------------------------------------------------------------------------------
 1 - 6    Record name "SPLIT "
 9 - 10    Continuation continuation Allows concatenation of multiple records.
12 - 15    IDcode    idCode    ID code of related entry.
17 - 20    IDcode    idCode    ID code of related entry.
22 - 25    IDcode    idCode    ID code of related entry.
27 – 30    IDcode    idCode    ID code of related entry.
32 - 35    IDcode    idCode    ID code of related entry.
37 - 40    IDcode    idCode    ID code of related entry.
42 - 45    IDcode    idCode    ID code of related entry.
47 - 50    IDcode    idCode    ID code of related entry.
52 - 55    IDcode    idCode    ID code of related entry.
57 - 60    IDcode    idCode    ID code of related entry.
62 - 65    IDcode    idCode    ID code of related entry.
67 - 70    IDcode    idCode    ID code of related entry.
72 - 75    IDcode    idCode    ID code of related entry.
77 - 80    IDcode    idCode    ID code of related entry.

Details

The SPLIT record can be continued on multiple lines, so that all related PDB entries are cataloged.

Verification/Validation/Value Authority Control
This record will be generated at the time of processing the component PDB files of the large macromolecular complex when all complex constituents are deposited.

Relationships to Other Record Types

REMARK 350 will contain an amended statement to reflect the entire complex.

Examples

 1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
SPLIT 1VOQ 1VOR 1VOS 1VOU 1VOV 1VOW 1VOX 1VOY 1VP0 1VOZ

CAVEAT

Overview

CAVEAT warns of errors and unresolved issues in the entry. Use caution when using an entry containing this record.

Record Format

COLUMNS    DATA TYPE  FIELD     DEFINITION
---------------------------------------------------------------------------------------
 1 - 6    Record name  "CAVEAT"
 9 - 10    Continuation continuation  Allows concatenation of multiple records.
12 - 15    IDcode    idCode     PDB ID code of this entry.
20 - 79    String    comment    Free text giving the reason for the CAVEAT.

Details

The CAVEAT will also be included in cases where the wwPDB is unable to verify the transformation of the coordinates back to the crystallographic cell. In these cases, the molecular structure may still be correct.

Verification/Validation/Value Authority Control

CAVEAT will be added to entries known to be incorrect.

COMPND

Overview

The COMPND record describes the macromolecular contents of an entry. Some cases where the entry contains a standalone drug or inhibitor, the name of the non-polymeric molecule will appear in this record. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, staff may exercise editorial judgment in consultation with depositors in assigning these names.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION 
----------------------------------------------------------------------------------
 1 - 6    Record name  "COMPND"  
 8 - 10    Continuation  continuation Allows concatenation of multiple records.
11 - 80    Specification compound   Description of the molecular components.
 list

Details

The compound record is a Specification list. The specifications, or tokens, that may be used are listed below:

TOKEN         VALUE DEFINITION
-------------------------------------------------------------------------
MOL_ID         Numbers each component; also used in SOURCE to associate
 the information.
MOLECULE        Name of the macromolecule.
CHAIN         Comma-separated list of chain identifier(s). 
FRAGMENT        Specifies a domain or region of the molecule.
SYNONYM        Comma-separated list of synonyms for the MOLECULE.
EC           The Enzyme Commission number associated with the molecule.
 If there is more than one EC number, they are presented
 as a comma-separated list.
ENGINEERED       Indicates that the molecule was produced using 
 recombinant technology or by purely chemical synthesis.
MUTATION        Indicates if there is a mutation.
OTHER_DETAILS     Additional comments.

In the case of synthetic molecules, the depositor will provide the description.
For chimeric proteins, the protein name is comma-separated and may refer to the presence of a linker (protein_1, linker, protein_2).
Asterisks in nucleic acid names (in MOLECULE) are for ease of reading.
No specific rules apply to the ordering of the tokens, except that the occurrence of MOL_ID or FRAGMENT indicates that the subsequent tokens are related to that specific molecule or fragment of the molecule.
When insertion codes are given as part of the residue name, they must be given within square brackets, i.e., H57[A]N. This might occur when listing residues in FRAGMENT or OTHER_DETAILS.
For multi-chain molecules, e.g., the hemoglobin tetramer, a comma-separated list of CHAIN identifiers is used.

Verification/Validation/Value Authority Control

CHAIN must match the chain identifiers(s) of the molecule(s). EC numbers are also checked.

Relationships to Other Record Types

In the case of mutations, the SEQADV records will present differences from the reference molecule. REMARK records may further describe the contents of the entry. Also see verification above.

Examples

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
COMPND  MOL_ID: 1;                              
COMPND  2 MOLECULE: HEMOGLOBIN ALPHA CHAIN;                  
COMPND  3 CHAIN: A, C;                             
COMPND  4 SYNONYM: DEOXYHEMOGLOBIN ALPHA CHAIN;                
COMPND  5 ENGINEERED: YES;                           
COMPND  6 MUTATION: YES;                            
COMPND  7 MOL_ID: 2;                              
COMPND  8 MOLECULE: HEMOGLOBIN BETA CHAIN;                   
COMPND  9 CHAIN: B, D;                             
COMPND 10 SYNONYM: DEOXYHEMOGLOBIN BETA CHAIN;                 
COMPND 11 ENGINEERED: YES;                           
COMPND 12 MUTATION: YES  
    
COMPND  MOL_ID: 1;                              
COMPND  2 MOLECULE: COWPEA CHLOROTIC MOTTLE VIRUS;               
COMPND  3 CHAIN: A, B, C;                           
COMPND  4 SYNONYM: CCMV;                            
COMPND  5 MOL_ID: 2;                              
COMPND  6 MOLECULE: RNA (5'-(*AP*UP*AP*U)-3');                 
COMPND  7 CHAIN: D, F;                             
COMPND  8 ENGINEERED: YES;                           
COMPND  9 MOL_ID: 3;                              
COMPND 10 MOLECULE: RNA (5'-(*AP*U)-3');                    
COMPND 11 CHAIN: E;                              
COMPND 12 ENGINEERED: YES   
                         
COMPND  MOL_ID: 1;                              
COMPND  2 MOLECULE: HEVAMINE A;                        
COMPND  3 CHAIN: A;                              
COMPND  4 EC: 3.2.1.14, 3.2.1.17;                       
COMPND  5 OTHER_DETAILS: PLANT ENDOCHITINASE/LYSOZYME

SOURCE

Overview

The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Some cases where the entry contains a standalone drug or inhibitor, the source information of this molecule will appear in this record. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.

Record Format

COLUMNS   DATA TYPE   FIELD     DEFINITION            
--------------------------------------------------------------------------------------
 1 - 6   Record name  "SOURCE"    
 8 - 10   Continuation  continuation Allows concatenation of multiple records.
11 - 79   Specification srcName    Identifies the source of the
 List             macromolecule in a token: value format.

Details

TOKEN                VALUE DEFINITION            
--------------------------------------------------------------------------------------
MOL_ID                Numbers each molecule. Same as appears in COMPND.
SYNTHETIC              Indicates a chemically-synthesized source. 
FRAGMENT               A domain or fragment of the molecule may be 
 specified.                 
ORGANISM_SCIENTIFIC         Scientific name of the organism.      
ORGANISM_COMMON           Common name of the organism. 
ORGANISM_TAXID            NCBI Taxonomy ID number of the organism.  
STRAIN                Identifies the strain.           
VARIANT               Identifies the variant.           
CELL_LINE              The specific line of cells used in the experiment.
ATCC                 American Type Culture Collection tissue   
 culture number.               
ORGAN                Organized group of tissues that carries on 
 a specialized function.           
TISSUE                Organized group of cells with a common   
 function and structure.           
CELL                 Identifies the particular cell type.    
ORGANELLE              Organized structure within a cell.     
SECRETION              Identifies the secretion, such as saliva, urine,
 or venom, from which the molecule was isolated.
CELLULAR_LOCATION          Identifies the location inside/outside the cell.
PLASMID               Identifies the plasmid containing the gene. 
GENE                 Identifies the gene.            
EXPRESSION_SYSTEM          Scientific name of the organism in which the
 molecule was expressed.
EXPRESSION_SYSTEM_COMMON       Common name of the organism in which the molecule
 was expressed.
EXPRESSION_SYSTEM_TAXID       NCBI Taxonomy ID of the organism used as the
 expression system.
EXPRESSION_SYSTEM_STRAIN       Strain of the organism in which the molecule
 was expressed.               
EXPRESSION_SYSTEM_VARIANT      Variant of the organism used as the 
 expression system.
EXPRESSION_SYSTEM_CELL_LINE     The specific line of cells used as the 
 expression system.
EXPRESSION_SYSTEM_ATCC_NUMBER    Identifies the ATCC number of the expression system.
EXPRESSION_SYSTEM_ORGAN       Specific organ which expressed the molecule.
EXPRESSION_SYSTEM_TISSUE       Specific tissue which expressed the molecule.
EXPRESSION_SYSTEM_CELL        Specific cell type which expressed the molecule.
EXPRESSION_SYSTEM_ORGANELLE     Specific organelle which expressed the molecule.
EXPRESSION_SYSTEM_CELLULAR_LOCATION Identifies the location inside or outside 
 the cell which expressed the molecule.
EXPRESSION_SYSTEM_VECTOR_TYPE    Identifies the type of vector used, i.e., 
 plasmid, virus, or cosmid.
EXPRESSION_SYSTEM_VECTOR       Identifies the vector used.
EXPRESSION_SYSTEM_PLASMID      Plasmid used in the recombinant experiment. 
EXPRESSION_SYSTEM_GENE        Name of the gene used in recombinant experiment.
OTHER_DETAILS            Used to present information on the source which 
 is not given elsewhere.

The srcName is a list of tokens: value pairs describing each biological component of the entry.
As in COMPND, the order is not specified except that MOL_ID or FRAGMENT indicates subsequent specifications are related to that molecule or fragment of the molecule.
Only the relevant tokens need to appear in an entry.
Molecules prepared by purely chemical synthetic methods are described by the specification SYNTHETIC followed by "YES" or an optional value, such as NON-BIOLOGICAL SOURCE or BASED ON THE NATURAL SEQUENCE. ENGINEERED must appear in the COMPND record.
In the case of a chemically synthesized molecule using a biologically functional sequence (nucleic or amino acid), SOURCE reflects the biological origin of the sequence and COMPND reflects its synthetic nature by inclusion of the token ENGINEERED. The token SYNTHETIC appears in SOURCE.
If made from a synthetic gene, ENGINEERED appears in COMPND and the expression system is described in SOURCE (SYNTHETIC does NOT appear in SOURCE).
If the molecule was made using recombinant techniques, ENGINEERED appears in COMPND and the system is described in SOURCE.
When multiple macromolecules appear in the entry, each MOL_ID, as given in the COMPND record, must be repeated in the SOURCE record along with the source information for the corresponding molecule.
Hybrid molecules prepared by fusion of genes are treated as multi-molecular systems for the purpose of specifying the source. The token FRAGMENT is used to associate the source with its corresponding fragment.

- When necessary to fully describe hybrid molecules, tokens may appear more than once 
 for a given MOL_ID.
- All relevant token: value pairs that taken together fully describe each fragment are 
 grouped following the appropriate FRAGMENT.
- Descriptors relative to the full system appear before the FRAGMENT (see third example 
 below).

ORGANISM_SCIENTIFIC provides the Latin genus and species. Virus names are listed as the scientific name.
Cellular origin is described by giving cellular compartment, organelle, cell, tissue, organ, or body part from which the molecule was isolated.
CELLULAR_LOCATION may be used to indicate where in the organism the compound was found. Examples are: extracellular, periplasmic, cytosol.
Entries containing molecules prepared by recombinant techniques are described as follows:

- The expression system is described.
- The organism and cell location given are for the source of the gene used in 
 the cloning experiment.
- Transgenic organisms, such as mouse producing human proteins, are treated as 
 expression systems.

The organism and cell location given are for the source of the gene used in the cloning experiment.
New tokens may be added by the wwPDB.

Verification/Validation/Value Authority Control

The biological source is compared to that found in the sequence databases. The Tax ID is identified and the corresponding scientific and common names for the organism is matched to a standard taxonomy database (such as NCBI).

Relationships to Other Record Types

Each macromolecule listed in COMPND must have a corresponding source.

Examples

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
SOURCE  MOL_ID: 1;
SOURCE  2 ORGANISM_SCIENTIFIC: AVIAN SARCOMA VIRUS;
SOURCE  3 ORGANISM_TAXID: 11876
SOURCE  4 STRAIN: SCHMIDT-RUPPIN B;
SOURCE  5 EXPRESSION_SYSTEM: ESCHERICHIA COLI 
SOURCE  6 EXPRESSION_SYSTEM_TAXID: 562
SOURCE  7 EXPRESSION_SYSTEM_PLASMID: PRC23IN
SOURCE  MOL_ID: 1;
SOURCE  2 ORGANISM_SCIENTIFIC: GALLUS GALLUS;
SOURCE  3 ORGANISM_COMMON: CHICKEN;
SOURCE  3 ORGANISM_TAXID: 9031
SOURCE  4 ORGAN: HEART;
SOURCE  5 TISSUE: MUSCLE

For a Chimera protein:

SOURCE  MOL_ID: 1;                              
SOURCE  2 ORGANISM_SCIENTIFIC: MUS MUSCULUS, HOMO SAPIENS;           
SOURCE  3 ORGANISM_COMMON: MOUSE, HUMAN;     
SOURCE  3 ORGANISM_TAXID: 10090, 9606             
SOURCE  5 EXPRESSION_SYSTEM: ESCHERICHIA COLI;  
SOURCE  6 EXPRESSION_SYSTEM_TAXID: 344601               
SOURCE  6 EXPRESSION_SYSTEM_STRAIN: B171;                   
SOURCE  7 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID;               
SOURCE  8 EXPRESSION_SYSTEM_PLASMID: P4XH-M13;

KEYWDS

Overview

The KEYWDS record contains a set of terms relevant to the entry. Terms in the KEYWDS record provide a simple means of categorizing entries and may be used to generate index files. This record addresses some of the limitations found in the classification field of the HEADER record. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION 
---------------------------------------------------------------------------------
 1 - 6    Record name  "KEYWDS" 
 9 - 10    Continuation continuation Allows concatenation of records if necessary.
11 - 79    List     keywds    Comma-separated list of keywords relevant
 to the entry.

Details

The KEYWDS record contains a list of terms relevant to the entry, similar to that found in journal articles. A phrase may be used if it presents a single concept (e.g., reaction center). Terms provided in this record may include those that describe the following:

- Functional classification.
- Metabolic role.
- Known biological or chemical activity.
- Structural classification.

Other classifying terms may be used. No particular ordering is required. A number of PDB entries contain complexes of macromolecules. In these cases, all terms applicable to each molecule should be provided separated by a comma.
Note that the terms in the KEYWDS record duplicate those found in the classification field of the HEADER record. Terms abbreviated in the HEADER record are unabbreviated in KEYWDS.

Verification/Validation/Value Authority Control

Terms used in the KEYWDS record are subject to scientific and editorial review. A list of terms, definitions, and synonyms will be maintained by the wwPDB. Every attempt will be made to provide some level of consistency with keywords used in other biological databases.

Relationships to Other Record Types

HEADER records contain a classification term which must also appear in KEYWDS. Scientific judgment will dictate when terms used in one entry to describe a molecule should be included in other entries with the same or similar molecules.

Example

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
KEYWDS  LYASE, TRICARBOXYLIC ACID CYCLE, MITOCHONDRION, OXIDATIVE
KEYWDS  2 METABOLISM

EXPDTA (updated)

Overview

The EXPDTA record presents information about the experiment.

The EXPDTA record identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modeling technique. Permitted values include:

 X-RAY DIFFRACTION
 FIBER DIFFRACTION
 NEUTRON DIFFRACTION
 ELECTRON CRYSTALLOGRAPHY
 ELECTRON MICROSCOPY
 SOLID-STATE NMR 
 SOLUTION NMR 
 SOLUTION SCATTERING

*Note:Since October 15, 2006, theoretical models are no longer accepted for deposition. Any theoretical models deposited prior to this date are archived at https://files.wwpdb.org/pub/pdb/data/structures/models.
Please see the documentation from previous versions for the related file format description.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION  
------------------------------------------------------------------------------------
 1 - 6    Record name  "EXPDTA"  
 9 - 10    Continuation continuation Allows concatenation of multiple records.
11 - 79    SList     technique   The experimental technique(s) with 
 optional comment describing the 
 sample or experiment.

Details

EXPDTA is mandatory and appears in all entries. The technique must match one of the permitted values. See above.
If more than one technique was used for the structure determination and is being represented in the entry, EXPDTA presents the techniques as a semi-colon separated list.

Verification/Validation/Value Authority Control

The verification program checks that the EXPDTA record appears in the entry and that the technique matches one of the allowed values. It also checks that the relevant standard REMARK is added, as in the cases of NMR or electron microscopy studies, that the appropriate CRYST1 and SCALE values are used.

Relationships to Other Record Types

If the experiment is an NMR or electron microscopy study, this may be stated in the TITLE, and the appropriate EXPDTA and REMARK records should appear. Specific details of the data collection and experiment appear in the REMARKs.

In the case of a polycrystalline fiber diffraction study, CRYST1 and SCALE contain the normal unit cell data.

Examples

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
EXPDTA  X-RAY DIFFRACTION
EXPDTA  NEUTRON DIFFRACTION; X-RAY DIFFRACTION
EXPDTA  SOLUTION NMR
EXPDTA  ELECTRON MICROSCOPY

NUMMDL (added)

Overview

The NUMMDL record indicates total number of models in a PDB entry.

Record Format

COLUMNS   DATA TYPE   FIELD     DEFINITION              
------------------------------------------------------------------------------------
 1 - 6   Record name  "NUMMDL"                       
11 - 14   Integer    modelNumber  Number of models.

Details

The modelNumber field lists total number of models in a PDB entry and is left justified.
If more than one model appears in the entry, the number of models included must be stated.
NUMMDL is mandatory if a PDB entry contains more than one models.

Verification/Validation/Value Authority Control

The verification program checks that the modelNumber field is correctly formatted.

Example

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
NUMMDL  20

MDLTYP (added)

Overview

The MDLTYP record contains additional annotation pertinent to the coordinates presented in the entry.

Record Format

COLUMNS   DATA TYPE   FIELD     DEFINITION              
------------------------------------------------------------------------------------
 1 - 6   Record name  "MDLTYP"                       
 9 - 10   Continuation  continuation Allows concatenation of multiple records.
11 - 80   SList     comment Free Text providing additional structural 
 annotation.

Details

The MDLTYP record will be used by the wwPDB to highlight certain features of the deposited coordinates as described below.
For entries that are determined by NMR methods and the coordinates deposited are either a minimized average or regularized mean structure, this record will contain the tag "MINIMIZED AVERAGE" to highlight the nature of the deposited coordinates in the entry.
Where the entry contains entire polymer chains that have only either C-alpha (for proteins) or P atoms (for nucleotides), the MDLTYP record will be used to describe the contents of such chains along with the chain identifier. For these polymeric chains, REMARK 470 (Missing Atoms) will be omitted.
If multiple features need to be described in this record, they will be separated by a ";" delineator.
Where an entry has multiple features requiring description in this record including MINIMIZED AVERAGE, the MINIMIZED AVERAGE value will precede all other annotation.
New descriptors may be added by the wwPDB.

Verification/Validation/Value Authority Control

The chain_identifiers described in this record must be present in the COMPND, SEQRES and the coordinate section of the entry.

Example

     1     2     3     4     5     6    7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
MDLTYP MINIMIZED AVERAGE 
MDLTYP CA ATOMS ONLY, CHAIN A, B, C, D, E, F, G, H, I, J, K ; P ATOMS ONLY, 
MDLTYP 2 CHAIN X, Y, Z 
MDLTYP MINIMIZED AVERAGE ; CA ATOMS ONLY, CHAIN A, B

AUTHOR

Overview

The AUTHOR record contains the names of the people responsible for the contents of the entry.

Record Format

COLUMNS   DATA TYPE   FIELD     DEFINITION              
------------------------------------------------------------------------------------
 1 - 6   Record name  "AUTHOR"                       
 9 - 10   Continuation  continuation Allows concatenation of multiple records.
11 - 79   List      authorList  List of the author names, separated  
 by commas.

Details

The authorList field lists author names separated by commas with no subsequent spaces.
Representation of personal names:

- First and middle names are indicated by initials, each followed by a period, 
 and precede the surname.
- Only the surname (family or last name) of the author is given in full.
- Hyphens can be used if they are part of the author's name.
- Apostrophes are allowed in surnames.
- Umlauts and other character modifiers are not given.

Structure of personal names:

- There is no space after any initial and its following period. 
- Blank spaces are used in a name only if properly part of the surname (e.g., J.VAN DORN), 
 or between surname and Jr., II, or III 
- Abbreviations that are part of a surname, such as Jr., St. or Ste., are followed by a 
 period and a space before the next part of the surname.

Representation of corporate, organization or university names:

- Group names used for one or all of the authors should be spelled out in full. 
- The name of the larger group comes before the name of a subdivision, e.g., 
 University of Somewhere, Department of Chemistry.

Structure of list:

- Line breaks between multiple lines in the authorList occur only after a comma.
- Personal names are not split across two lines.

Special cases:

- Names are given in English if there is an accepted English version; otherwise in the 
 native language, transliterated if necessary.

Verification/Validation/Value Authority Control

The verification program checks that the authorList field is correctly formatted. It does not perform any spelling checks or name verification.

Relationships to Other Record Types

The format of the names in the AUTHOR record is the same as in JRNL and REMARK 1 references.

Example

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
AUTHOR  M.B.BERRY,B.MEADOR,T.BILDERBACK,P.LIANG,M.GLASER,
AUTHOR  2 G.N.PHILLIPS JR.,T.L.ST. STEVENS

REVDAT (updated)

Overview

REVDAT records contain a history of the modifications made to an entry since its release.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION               
-------------------------------------------------------------------------------------
 1 - 6    Record name  "REVDAT"                       
 8 - 10    Integer    modNum    Modification number.          
11 - 12    Continuation continuation Allows concatenation of multiple records.
14 - 22    Date     modDate    Date of modification (or release for  
 new entries) in DD-MMM-YY format. This is
 not repeated on continued lines.
24 - 27 IDCode modId ID code of this entry. This is not repeated on 
                     continuation lines.  
32      Integer    modType    An integer identifying the type of  
 modification. For all revisions, the
 modification type is listed as 1 
40 - 45    LString(6)  record    Modification detail. 
47 - 52    LString(6)  record    Modification detail. 
54 - 59    LString(6)  record    Modification detail. 
61 - 66    LString(6)  record    Modification detail.

Details

Each time revisions are made to the entry, a modification number is assigned in increasing (by 1) numerical order. REVDAT records appear in descending order (most recent modification appears first). New entries have a REVDAT record with modNum equal to 1 and modType equal to 0. Allowed modTypes are:

0     Initial released entry.
1     Other modification.

Each revision may have more than one REVDAT record, and each revision has a separate continuation field.
Modification details are typically PDB record names such as JRNL, SOURCE, TITLE, or COMPND. A special modification detail VERSN indicates that the file has undergone a change in version. The current version will be specified in REMARK 4.

Verification/Validation/Value Authority Control

The modType must be one of the defined types, and the given record type must be valid. If modType is 0, the modId must match the entry's ID code in the HEADER record.

Relationships to Other Record Types

In the case of a version revision, the current will be specified in REMARK 4.

Template

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
REVDAT  2  15-OCT-99 1ABC  1    REMARK
REVDAT  1  09-JAN-89 1ABC  0

     1     2     3     4     5    6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
REVDAT  2  11-MAR-08 2ABC  1    JRNL VERSN
REVDAT  1  09-DEC-03 2ABC  0

SPRSDE

Overview

The SPRSDE records contain a list of the ID codes of entries that were made obsolete by the given coordinate entry and removed from the PDB release set. One entry may replace many.

It is wwPDB policy that only the principal investigator of a structure has the authority to obsolete it.
Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION              
-----------------------------------------------------------------------------------
 1 - 6    Record name  "SPRSDE"                       
 9 - 10    Continuation continuation Allows for multiple ID codes.     
12 - 20    Date     sprsdeDate  Date this entry superseded the listed 
 entries. This field is not copied on  
 continuations.        
22 - 25    IDcode    idCode    ID code of this entry. This field is not
 copied on continuations.    
32 - 35    IDcode    sIdCode    ID code of a superseded entry.     
37 - 40    IDcode    sIdCode    ID code of a superseded entry.     
42 - 45    IDcode    sIdCode    ID code of a superseded entry.     
47 - 50    IDcode    sIdCode    ID code of a superseded entry.     
52 - 55    IDcode    sIdCode    ID code of a superseded entry.     
57 - 60    IDcode    sIdCode    ID code of a superseded entry.     
62 - 65    IDcode    sIdCode    ID code of a superseded entry.     
67 - 70    IDcode    sIdCode    ID code of a superseded entry.
72 - 75 IDcode sIdCode ID code of a superseded entry.

Details

The ID code list is terminated by the first blank sIdCode field.

Verification/Validation/Value Authority Control

wwPDB checks that the superseded entries have actually been removed from release.

Relationships to Other Record Types

The sprsdeDate is usually the date the entry is released, and therefore matches the date in the REVDAT 1 record. The ID code found in the idCode field must be the same as one found in the idCode field of the HEADER record.

Example

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
SPRSDE   17-JUL-84 4HHB   1HHB
SPRSDE   27-FEB-95 1GDJ   1LH4 2LH4

JRNL (updated)

Overview

The JRNL record contains the primary literature citation that describes the experiment which resulted in the deposited coordinate set. There is at most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference. Other references are given in REMARK 1.

Record Format

COLUMNS    DATA TYPE   FIELD     DEFINITION         
-----------------------------------------------------------------------
 1 - 6    Record name  "JRNL"                 
13 - 79    LString    text     See Details below.

Details

The following tables are used to describe the sub-record types of the JRNL record.
The AUTH sub-record is mandatory in JRNL. This is followed by TITL, EDIT, REF, PUBL, REFN, PMID and DOI sub- record types. REF and REFN are also mandatory in JRNL. EDIT and PUBL may appear only if the reference is to a non-journal.

1. AUTH

AUTH contains the list of authors associated with the cited article or contribution to a larger work (i.e., AUTH is not used for the editor of a book).
The author list is formatted similarly to the AUTHOR record. It is a comma-separated list of names. Spaces at the end of a sub-record are not significant; all other spaces are significant. See the AUTHOR record for full details.
The authorList field of continuation sub-records in JRNL differs from that in AUTHOR by leaving no leading blank in column 20 of any continuation lines.
One author's name, consisting of the initials and family name, cannot be split across two lines. If there are continuation sub-records, then all but the last sub-record must end in a comma.

COLUMNS    DATA TYPE   FIELD      DEFINITION             
-------------------------------------------------------------------------------
 1 - 6    Record name  "REMARK"                     
10      LString(1)  "1"                        
13 - 16    LString(4)  "AUTH" Appears on all continuation records.
17 - 18    Continuation continuation  Allows a long list of authors.   
20 - 79    List     authorList   List of the authors.

2. TITL

TITL specifies the title of the reference. This is used for the title of a journal article, chapter, or part of a book. The TITL line is omitted if the author(s) listed in authorList wrote the entire book (or other work) listed in REF and no section of the book is being cited.
If an article is in a language other than English and is printed with an alternate title in English, the English language title is given, followed by a space and then the name of the language (in its English form, in square brackets) in which the article is written.
If the title of an article is in a non-Roman alphabet the title is transliterated.
The actual title cited is reconstructed in a manner identical to other continued records, i.e., trailing blanks are discarded and the continuation line is concatenated with a space inserted.
A line cannot end with a hyphen. A compound term (two elements connected by a hyphen) or chemical names which include a hyphen must appear on a single line, unless they are too long to fit on one line, in which case the split is made at a normally-occurring hyphen. An individual word cannot be hyphenated at the end of a line and put on two lines. An exception is when there is a repeating compound term where the second element is omitted, e.g., "DOUBLE- AND TRIPLE-RESONANCE". In such a case the non-completed word "DOUBLE-" could end a line and not alter reconstruction of the title.

COLUMNS    DATA TYPE   FIELD      DEFINITION             
-----------------------------------------------------------------------------------
 1 - 6    Record name  "REMARK"                     
10      LString(1)  "1"                        
13 - 16    LString(4)  "TITL" Appears on all continuation records.
17 - 18    Continuation continuation  Permits long titles.        
20 - 79    LString    title      Title of the article.

3. EDIT

EDIT appears if editors are associated with a non-journal reference. The editor list is formatted and concatenated in the same way that author lists are.

COLUMNS    DATA TYPE   FIELD     DEFINITION             
-----------------------------------------------------------------------------------
1 - 6    Record name  "REMARK"                     
10      LString(1)  "1"                        
13 - 16    LString(4)  "TITL" Appears on all continuation records.
17 - 18    Continuation continuation  Permits long titles.        
20 - 79    LString    title     Title of the article.

4. REF

REF is a group of fields that contain either the publication status or the name of the publication (and any supplement and/or report information), volume, page, and year. There are two forms of this sub-record group, depending upon the citation's publication status.

4a. If the reference has not been published yet, the sub-record type group has the form:

COLUMNS    DATA TYPE   FIELD      DEFINITION
--------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(3)  "REF"
20 - 34    LString(15)  "TO BE PUBLISHED"

Publication name (first item in pubName field):

If the publication is a serial (i.e., a journal, an annual, or other non-book or non-monographic item issued in parts and intended to be continued indefinitely), use the abbreviated name of the publication as listed in PubMed with periods.

If the publication is a book, monograph, or other non-serial item, use its full name according to the Anglo-American Cataloguing Rules, 2nd Revised Edition; (AACR2R). (Non-serial items include theses, videos, computer programs, and anything that is complete in one or a finite number of parts.) If there is a sub-title, verifiable in an online catalog, it will be included using the same punctuation as in the source of verification. Preference will be given to verification using cataloging of the Library of Congress, the National Library of Medicine, and the British Library, in that order.

If a book is part of a monographic series: the full name of the book (according to the AACR2R) is listed first, followed by the name of the series in which it was published. The series information is given within parentheses and the series name is preceded by "IN:" and a space. The series name should be listed in full unless the series has an accepted ISO abbreviation. If applicable, the series name should be followed, after a comma and a space, by a volume (V.) and/or number (NO.) and/or part (PT.) indicator and its number and/or letter in the series.

Supplement (follows publication name in pubName field):

If a reference is in a supplement to the volume listed, or if information about a "part" is needed to distinguish multiple parts with the same page numbering, such information should be put in the REF sub-record.

A supplement indication should follow the name of the publication and should be preceded by a comma and a space. Supplement should be abbreviated as "SUPPL." If there is a supplement number or letter, it should follow "SUPPL." without an intervening space. A part indication should also follow the name of the publication and be preceded by a comma and a space. A part should be abbreviated as "PT.", and the number or letter should follow without an intervening space.

If there is both a supplement and a part, their order should reflect the order printed on the work itself.

Report (follows publication name and any supplement or part information in pubName field):

If a book has a report designation, the report information should follow the title and precede series information. The name and number of the report is given in parentheses, and the name is preceded by "REPORT:" and a space.

Reconstruction of publication name:

The name of the publication is reconstructed by removing any trailing blanks in the pubName field, and concatenating all of the pubName fields from the continuation lines with an intervening space. There are two conditions where no intervening space is added between lines: when the pubName field on a line ends with a hyphen or a period, or when the line ends with a hyphen (-). When the line ends with a period (.), add a space if this is the only period in the entire pubName field; do not add a space if there are two or more periods throughout the pubName field, excluding any periods after the designations "SUPPL", "V", "NO", or "PT".

Volume, page, and year (volume, first page, year fields respectively):

The REF sub-record type group also contains information about volume, page, and year when applicable.

In the case of a monograph with multiple volumes which is also in a numbered series, the number in the volume field represents the number of the book, not the series. (The volume number of the series is in parentheses with the name of the series, as described above under publication name.)

COLUMNS    DATA TYPE   FIELD      DEFINITION
---------------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(3)  "REF "
17 - 18    Continuation  continuation  Allows long publication names.
20 - 47    LString    pubName     Name of the publication including section
 or series designation. This is the only
 field of this sub-record which may be
 continued on successive sub-records.
50 - 51    LString(2)  "V." Appears in the first sub-record only,
 and only if column 55 is non-blank.
52 - 55    String    volume     Right-justified blank-filled volume
 information; appears in the first
 sub-record only.
57 - 61    String    page      First page of the article; appears in 
 the first sub-record only.
63 - 66    Integer    year      Year of publication; first sub-record only.

5. PUBL

PUBL contains the name of the publisher and place of publication if the reference is to a book or other non-journal publication. If the non-journal has not yet been published or released, this sub-record is absent.
The place of publication is listed first, followed by a space, a colon, another space, and then the name of the publisher/issuer. This arrangement is based on the ISBD(M) International Standard Bibliographic Description for Monographic Publications (Rev.Ed., 1987) and the AACR2R, and is used in public online catalogs in libraries. Details on the contents of PUBL are given below. Place of publication:

Give the place of publication. If the name of the country, state, province, etc. is considered necessary to distinguish the place of publication from others of the same name, or for identification, then follow the city with a comma, a space, and the name of the larger geographic area.

If there is more than one place of publication, only the first listed will be used. If an online catalog record is used to verify the item, the first place listed there will be used, omitting any brackets. Preference will be given to the cataloging done by the Library of Congress, the National Library of Medicine, and the British Library, in that order.

Publisher's name (or name of other issuing entity):

Give the name of the publisher in the shortest form in which it can be understood and identified internationally, according to AACR2R rule 1.4D.

If there is more than one publisher listed in the publication, only the first will be used in the PDB file. If an online catalog record is used to verify the item, the first place listed there will be used for the name of the publisher. Preference will be given to the cataloging of the Library of Congress, the National Library of Medicine, and the British Library, in that order.

Ph.D. and other theses:

Theses are presented in the PUBL record if the degree has been granted and the thesis made available for public consultation by the degree-granting institution. The name of the degree-granting institution (the issuing agency) is followed by a space and "(THESIS)".

Reconstruction of place and publisher:

The PUBL sub-record type can be reconstructed by removing all trailing blanks in the pub field and concatenating all of the pub fields from the continuation lines with an intervening space. Continued lines do not begin with a space.

COLUMNS    DATA TYPE   FIELD      DEFINITION
--------------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(4)  "PUBL"
17 - 18    Continuation continuation  Allows long publisher and place names.
20 - 70    LString    pub       City of publication and name of the
 publisher/institution.

6. REFN

REFN is a group of fields that contain encoded references to the citation. No continuation lines are possible. Each piece of coded information has a designated field.
There are two forms of this sub-record type group, depending upon the publication status.

6a. This form of the REFN sub-record type group is used if the citation has not been published.

COLUMNS    DATA TYPE   FIELD     DEFINITION
--------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(4)  "REFN"

6b. This form of the REFN sub-record type group is used if the citation has been published.

COLUMNS    DATA TYPE   FIELD     DEFINITION
-------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(4)  "REFN"
36 - 39    LString(4)  "ISSN" or International Standard Serial Number or 
 "ESSN"  Electronic Standard Serial Number.
41 - 65    LString    issn      ISSN number (final digit may be a
 letter and may contain one or 
 more dashes).

7. PMID

PMID lists the PubMed unique accession number of the publication related to the entry.

COLUMNS    DATA TYPE   FIELD     DEFINITION
--------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(4)  "PMID"
20 – 79    Integer    continuation unique PubMed identifier number assigned to 
 the publication describing the experiment.
 Allows for a long PubMed ID number.

8. DOI

DOI is the Digital Object Identifier for the related electronic publication ("e-pub"), if applicable.
Every DOI consists of a publisher prefix, a fore-slash ("/"), and then a suffix which can be any length and may include a combination of numbers and alphabets. For example: 10.1073/PNAS.0712393105

COLUMNS    DATA TYPE   FIELD     DEFINITION
--------------------------------------------------------------------------------
 1 - 6    Record name  "JRNL "
13 - 16    LString(4)  "DOI "
20 – 79    LString    continuation  Unique DOI assigned to the publication
 describing the experiment.
 Allows for a long DOI string.

Verification/Validation/Value Authority Control

wwPDB verifies that this record is correctly formatted.

Citations appearing in JRNL may not also appear in REMARK 1.

Relationships to Other Record Types

The publication cited as the JRNL record may not be repeated in REMARK 1.

Example

     1     2     3     4     5     6     7     8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
JRNL    AUTH  G.FERMI,M.F.PERUTZ,B.SHAANAN,R.FOURME            
JRNL    TITL  THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT      
JRNL    TITL 2 1.74 A RESOLUTION                      
JRNL    REF  J.MOL.BIOL.          V. 175  159 1984       
JRNL    REFN  ISSN 0022-2836                
JRNL    PMID  6726807                           
JRNL    DOI  10.1016/0022-2836(84)90472-8

Known Problems

Interchange of bibliographic information and linking with other databases is hampered by the lack of labels or specific locations for certain types of information or by more than one type of information being in a particular location. This is most likely to occur with books, series, and reports. Some of the points below provide details about the variations and/or blending of information.
Titles of the publications that require more than 28 characters on the REF line must be continued on subsequent lines. There is some awkwardness due to volume, page, and year appearing on the first REF line, thereby splitting up the title.
Information about a supplement and its number/letter is presented in the publication's title field (on the REF lines in columns 20 - 47).
When series information for a book is presented, it is added to the REF line. The number of REF lines can become large in some cases because of the 28-column limit for title information in REF.
Books that are issued in more than one series are not accommodated.
Pagination is limited to the beginning page.