Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Metadata Lost When Loading, Modifying and Writing Structure #364

Answered by wojdyr
katkanaz asked this question in Q&A
Discussion options

  • gemmi version 0.7.1

Hello, I am trying to modify a structure, so that it only contains sugar residues with either altloc A or B -- in the end I want to create two files, each with the given altloc.
I can traverse the loaded structure and remove the desired atoms, but when writing it, the resulting file does not contain some metadata which I need (e.g. _entity.src_method).

import gemmi
from pathlib import Path
def write_structure(input_structure: Path, output_path: Path) -> None:
 structure = gemmi.read_structure(str(input_structure))
 structure.setup_entities()
 to_remove = []
 
 for model_idx, model in enumerate(structure):
 for chain_idx, chain in enumerate(model):
 for residue_idx, residue in enumerate(chain):
 # simplified example of removing residues/atoms
 # in my code I filter by altloc value on the level of atoms
 if residue.name == "GLC":
 to_remove.append([model_idx,chain_idx,residue_idx])
 for rm in reversed(to_remove):
 del structure[rm[0]][rm[1]][rm[2]]
 options = gemmi.cif.WriteOptions()
 options.misuse_hash = True
 options.align_pairs = 48
 options.align_loops = 20
 structure.make_mmcif_document().write_file(str(output_path), options)

Looking through the documentation and issues (#60 (comment)), I found it is possible to load the file as a Document and update some parts of it from the modified structure object.

def write_doc(input_structure: Path, output_path: Path) -> None:
 structure = gemmi.read_structure(str(input_structure))
 structure.setup_entities()
 to_remove = []
 for model_idx, model in enumerate(structure):
 for chain_idx, chain in enumerate(model):
 for residue_idx, residue in enumerate(chain):
 # simplified example of removing residues/atoms
 # in my code I filter by altloc value on the level of atoms
 if residue.name == "GLC":
 to_remove.append([model_idx,chain_idx,residue_idx])
 for rm in reversed(to_remove):
 del structure[rm[0]][rm[1]][rm[2]]
 groups = gemmi.MmcifOutputGroups(True)
 groups.atoms = True
 doc = gemmi.cif.read(str(input_structure))
 # block = doc.find_block(structure.info["_entry.id"])
 block = doc.sole_block()
 structure.update_mmcif_block(block, groups)
 options = gemmi.cif.WriteOptions()
 options.misuse_hash = True
 options.align_pairs = 48
 options.align_loops = 20
 doc.write_file(str(output_path), options)

I would expect, that setting groups.atoms = True would overwrite the ATOM/HETATM lines in the document from the modified structure object. But the resulting file does not have these lines at all.
Setting groups = gemmi.MmcifOutputGroups(True) does include the modified lines in the output, but the metadata I need is lost.

  • am I understanding update_mmcif_block() correctly and is there another groups option I need to set?
  • if I were able to create a document with updated atoms and retained metadata, would this be a valid file? (can gemmi update the rest of the file to reflect the deletion of some residues?)
  • am I overcomplicating it and is there a simpler way to remove all A (or all B) altlocs from sugar residues?
You must be logged in to vote

There was a somewhat related question last month: #362

Replies: 1 comment 4 replies

Comment options

There was a somewhat related question last month: #362

You must be logged in to vote
4 replies
Comment options

Thank you for your response. Setting the chem_comp and entity to False solved the problem with the missing metadata. However I have now encountered a different problem. I am using a tool that parses mmCIF files that requires _atom_site.auth_atom_id and _atom_site.auth_comp_id. I read in the documentation that gemmi only reads one of these (auth if present, label otherwise). From what I have seen in the resulting files which were modified as a Structure and written by gemmi, only the label_comp_id and label_atom_id columns are written, with auth_comp_id and auth_atom_id missing. Is there a way to also write the two auth columns back?

I also wanted to ask if Gemmi checks and/or handles a difference between label and auth names/number, if such difference occurs?

Does reading a file as a Structure also handle modifying dependencies, when deleting residuus? Or would it be the same as removing the corresponding atom rows "manually"?

Comment options

There is an option auth_all=True for writing both label_comp_id and auth_comp_id, etc.

Re label/auth differences, only chain names and sequence numbers differ and gemmi handles this.

Comment options

Dependendencies – I'm not sure, I'd need to check. Do you have something specific in mind?

Comment options

There is an option auth_all=True for writing both label_comp_id and auth_comp_id, etc.

Re label/auth differences, only chain names and sequence numbers differ and gemmi handles this.

Thank you so much, that helped. It works well now.

Dependendencies – I'm not sure, I'd need to check. Do you have something specific in mind?

I'm relatively new to mmCIF files so I'm not 100% sure I am using the correct terminology. I was thinking of links as in connectivity relationships between atoms, such as the information in _struct_conn, _chem_comp_bond or similar.

Answer selected by katkanaz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants

AltStyle によって変換されたページ (->オリジナル) /