Structural Superpositions

Introduction

Often two structures need to be compared and usually the two structures are not superimposed, i.e., they have a different orientation and position relative to each other. Even if two structures of the same molecule have been determined in the same crystal form, it is necessary to superimpose the molecules, since they will have slightly different positions.

  • Alignment based on the secondary structure (topology)

    The two structures will be superimposed based on the protein topology, i.e., on the orientation of strands and helices. This option should be used if the structures are rather dissimilar and if the regions which need to be aligned are not known.

  • Alignment based on specified residues

    It is specified which residues are equivalent in the two structures and the best superposition is determined based on a least squares algorithm minimizing the interatomic distances of the equivalent atoms of the superimposed structures. Usually only the CA coordinates are used, since they show the least variation and are well determined.

    Examples:

    • The simplest case is if there are no chain breaks and the same protein needs to be superimposed, e.g., residue 1 to 120 to residue 1 to 120. Usually the program will align the first residue of the first chain to the first residue of the other chain, and so forth.

    • If, for example, in pdb file A the residues 1 to 120 are present and in pdb file B only the residues 2 to 120, you need to specify alignment of 2-120 of both chains.

    • If there are breaks in one of the chain, the corresponding residues should not be specified for both chains. Chain A has, for instance, the residues 1 to 53 and 62 to 120 and chain B the residues 2 to 25 and 28 to 120, use as alignment regions 2-25 and 28-53 and 62-120.

    • If you need to align two related proteins, you need to know which residues are equivalent. This can be done based on a sequence alignment. Many programs automatically align two proteins based on a primary sequence alignment. The structural alignment will be as good as the primary sequence alignment. It cannot be easily predicted, if it is better to align two homologous proteins based on a sequence alignment and superposition of the aligned residues or based on the 3D topology. For very distantly related proteins, where only the active site is conserved and the rest of the fold is homologous but very diverged, it is preferable to specify equivalent residues of the active site for superposition. You may use characteristic motives to do a first superposition and based on the result decide on residues to exclude and further residues to include into the superposition.

In summary, the result of a structure superposition can depend significantly on the method used and on the residues used for the superposition. Therefore this should always be specified.

Superimpose structures with COOT or CCP4

Residue-based alignment

In COOT activate → "Calculate" → "LSQ Superpose".

  • Specify the two molecules to be superimposed.
  • Define the residue ranges that are used for superimposition of both molecules. Unfortunately, you can only specify one range for each chain.

    COOT seems to be able to figure out missing residues in the specified ranges, but make sure that the correct corresponding residues are used for superimposition.

    Always check the text output!

  • You will find the aligned regions, the rms deviations and the superposition matrix inside of the COOT text window.

Topology-based alignment

In COOT activate → "Calculate" → "SSM Superpose".

  • Specify the two molecules to be superimposed and the chain names if necessary.
  • If you need to superimpose only a domain or region of a protein chain, first prepare models, which only contain the domains to be superimposed. You can also use the COOT feature to → "Edit" → "Extract fragment".
  • The superimposed coordinates can also be saved to CCP4 (→ "File" → "Save mol to CCP4i2") or into a file (→ "File" → "Save coordinates").

In CCP4 go to → "Coordinate data tools" → "Structural alignment - GESAMT". Please refer also to E. Krissinel. (2012) 'Enhanced fold recognition using efficient short fragment clustering' J. Mol. Biochem., 1, (2), 76-85..

  • Specify the two molecules (Atomic models) to be superimposed. Both have to be imported to CCP4i2 first.
  • Selections can be made with the CCP4i2 syntax. Browse to the GITLAB documentation of CCP4i2 to find more info about the selection syntax.
  • Next → "Run" the job.