PyMOL selections

Selections are important to show or color residues or to apply commands to only selected residues. Refer to PyMOL selection algebra for an overview of available selection operators. We give an introduction to the most important concepts here.

Atom info in the pdb file

To better understand PyMOL atom selection, it is useful to have a look at the information which a protein data bank file (PDB file) of a structure provides:

ATOM     11  CA  PRO A   1      -7.497  20.769   1.930  1.00 34.87           C

For each atom, the pdb file lists the following information (More info about the format here): The PyMOL selection names given in bold are most important for typical applications.

element
our
example
PyMOL selection name
explanation
Atom type ATOM ("hetatm" or "not hetatm") "ATOM" for atoms defined by protein or nucleic acid sequence, "HETATM" for all other atoms (solvent, ligands, ions, ...)
serial atom number 11
A serial number, not used by most programs
atom name CA name Main chain atoms: CA, C, O, N, side chain atoms: CB, CG,...
alt id " " alt This is the alternative location identifier. Necessary if a residue has alternative conformations: "A", "B"
residue name PRO resn The residue name, the usual 3-letter code for the proteinogenic amino acids
chain id A chain The chain ID. May be empty (" "). Necessary if the structure contains more than one polypeptide chain. Water molecules or other external ligands may have unique chain ids.
residue number 1 resi Corresponds to the number of the residue in the gene sequence for gene-encoded amino acids.
insertion code " " (resi) Some pdb files, usually if residues are inserted compared to the wild-type gene, have multiple residues of the same number. These are distinguished by the insertion code (usually "A", "B", ...). In PyMOL the insertion code is appended to the residue number (e.g. resi 100A, 100B, ...).
atom coordinates -7.497 ... x, y, z The three atom coordinates x, y, z in Angstrom in a Cartesian (orthogonal) axes system.
occupancy 1.00 q The atomic occupancy (0.0 ... 1.0)
temperature factor 34.87 b The temperature factor, also known as B factor. Describes the mobility (thermal motion or static disorder) of the atoms. This column can be used to store other parameters, see e.g. AlphaFold models.
segment id "    " segi A segment identifier that is often the same as the chain id, but may be different.
atomic element C elem Describes the element type of the atom, here a carbon atom. Useful to set atom colors.
charge "  " charge The atomic charge (2 characters).

Assuming a proper pdb file, each atom is uniquely identified by the atom name, the chain id and the residue number, if no alternate conformations are present. Thus the atom of our example should be unambiguously selected and shown by the following command, if only one structure (pdb file) has been loaded.

show spheres, chain A and resi 1 and name CA

If we have several structures (pdb files or objects/molecules/selections), we need to specify these as well:

show spheres, pdb1 and chain A and resi 1 and name CA

because other loaded structures may contain a protein residue 1 as well.

You may notice, that the object name ("pdb1") can be specified directly, but you may also refer to it via "model pdb1" or "object pdb1", if pdb1 is an object (and not a named selection).

Boolean algebra

The individual selections of the complete selection string are combined and evaluated using Boolean algebra:

For more complicated selections, brackets may used (see examples below).

show sticks, pdb1 and chain A and resi 123+234+345 and not name C+N+O Shows the side chains of these residues. pdb1 is in this case a molecule that has been read with this name, e.g. by command:
fetch 4eiy, pdb1.
show sticks, pdb1 and chain A and ((resi 123+234+345 and not name C+N+O) or (resi 130-133)) Show in addition residues 130 to 133 with all atoms, as the main chain forms interactions with a ligand.
color paleyellow, ele C and chain A Color all carbon atoms of chain A in yellow.
show spheres, resn HOH Water molecules usually have the residue name HOH.
show sticks, polymer and (alt "" or alt A) Shows all protein residues with a single conformation and conformation A of residues with multiple conformations. See below for "polymer".
show sticks, polymer and not alt "" Shows all residues with alternate conformations.

As you notice by these examples, you can specify a range of residues via "resi 100-200" and select multiple items of the same kind with the "+" sign: "resn GLU+ASP" is the same selection as "resn GLU or resn ASP"

Selection macros

The strings, that are returned in the output window when you left-click on an atom are named "selection macros". These have the form:

/object-name/segi-identifier/chain-identifier/resi-identifier/name-identifier

or in terms of PyMOL selection language:

object/segi/chain/resi/name (object name/segment identifier/chain id/residue number/atom name)

These macros can be used as a short alternative for specifying selections. For example

"ligand and chain A and resi 100 and name CA" is equivalent to "ligand//A/100/CA". Note that we can omit to specify a selection for the segment identifier by leaving that field empty. So, "/////CA" will select all Cα atoms. The resi-identifier can include the residue name in the form: "GLU`100".

Further selection based on chemical class or properties

PyMOL provides some selections based on chemical class or properties:

organic Non-polymer organic compounds (e.g. ligands, cofactors, lipids or bound molecules bound from crystallization buffer)
solvent
water molecules
polymer
protein residues or nucleic acid residues (may be refined to polymer.protein and polymer.nucleic)
backbone
polymer backbone atoms (name CA+C+O+N for proteins)
sidechain
Side chain atoms. Be aware that "show sticks, sidechain" will not show the CA-CB bond, which is usually displayed as well.
metals
Selects metal ions.
donors
Hydrogen bond donor atoms (For ligands be aware that the assignment may be wrong)
acceptors
Hydrogen bond acceptors (For ligands be aware that the assignment may be wrong)

Selections based on distances or bonding

There are several very similar operators that select by pairwise atom distances.

Syntax 1: s1 operator X of s2
Syntax 2: s1 and (s2 operator X)

operator distance is ... measured from includes s2 syntax notes
near_to ≤ X center never 1 equivalent to "around"
within ≤ X center if matches s1 1
beyond > X center never 1
gap > X center+vdw never 2
around ≤ X center never 2 equivalent to "near_to"
expand ≤ X center always 2

Examples

create env, byres (resn ZMA around 5.0) and not resn ZMA Selects all residues that are within 5.0 Å distance from a ligand with residue name ZMA.
dist hbonds, ligand or env, ligand or env, mode=2
Draws all polar interactions (or hydrogen bonds) between and within the ligand and its environment. Creates a new object "hbonds". More details here.
dist hbonds, ligand, env, mode=2
Draws all direct polar interactions (or hydrogen bonds) between the ligand and its environment.

Back to PyMOL tutorial main page.