Many proteins bind small organic ligands. Such ligands are usually integral to the functions of their cognate proteins; therefore understanding the structural and dynamical aspects of their binding is an essential component of the overall structural picture.
In some cases following purification of overexpressed protein, ligand binding sites may be fully or partially occupied by ligand acquired from the expression host. In other cases, protein is purified with no ligand bound--either due to loss during purification, because the ligand is not present during overexpression, or because the protein does not bind any ligand.
The task of identifying the ligand could involve high-throughput screening, trial and error experimentation, or educated guessing based on prior knowledge derived from the amino acid sequence or the protein structure.
With protein NMR spectroscopy, ligand binding is usually inferred when ligand added to a protein sample effects chemical shift changes in a subset of peaks in HSQC spectra of the protein. If a ligand binding site is partially occupied by a sub-stoichiometric amount of slow-exchanging ligand, such as might occur when the quantity of overexpressed protein exceeds the amount of ligand available in the host organism, then adding ligand to the sample may cause extra HSQC peaks to disappear.
This topic addresses only the steps involved in determining the solution NMR structure of a protein with a bound ligand when the identity of the ligand is known.
Acquisition of NMR Data
Three types of data provide specific structural information about bound ligands:
- Chemical shift perturbations in 1H-15N HSQC and 1H-13C HSQC spectra. Proximity to the ligand is inferred from effects on chemical shifts of 1H, 13C, and 15N nuclei in the protein. Note that ligand-induced conformational changes in the protein could also cause such shifts.
- Intermolecular NOEs observed between protein and ligand protons. For example, a 3D f1-13C-edited, f3-13C-filtered NOESY is often recorded. This experiment selects for (edits) protons attached to 13C in f1, and selects against (filters) protons bound to 13C in f1. So if unlabeled ligand is bound to labeled protein, this experiment ideally will yield only NOESY peaks between protein and ligand protons; intra-protein NOESY peaks will be absent. N.B.: Both intra-and intermolecular NOEs are observed in the normal 3D 13C-edited NOESY recorded on a sample containing labeled protein and unlabeled ligand; however, a 4D 1H-13C-HMQC-NOESY-HMQC experiment will yield only intramolecular NOEs for such a sample because 13C editing occurs for both donor and acceptor protons. This can be used to some advantage in combination with the 3D experiment.
- A third experiment can also be helpful: A double 13C-filtered 1H-1H-NOESY provides NOEs only within the unlabled ligand.
N.B.: if protein is purified from the expression culture with ligand already bound, then of course the ligand too is isotope labeled. Adding additional unlabled ligand can dilute the label as the bound and free ligand exchange over time.
Structure Calculation (XPLOR/CNS)
In order to calculate the protein-ligand structure using restrained molecular dynamics and simulated annealing, it is necessary to have appropriate parameter and topology files for the ligand. You also need a .pdb file for the ligand.
If not available from someone who as already constructed them for a particular ligand, they must be created. This can be done using the HICCUP server, which automatically generates them based on ligand coordinates in the PDB. Alternatively, the PRODRG server can be used to generate parameter and topology files.
Both sources must be used carefully, and the resulting files will require further editing--for example to add hydrogens and to allow free rotation about single bonds while planarity of aromatic rings and conjugated double bonds is maintained. It is essential that all atom names, atom types, atomic masses, charges, bond distances, bond angles, torsion angles, impropers, and stereochemistry be correct. Using some common sense organic chemistry, you often can borrow parameters from amino acids and nucleic acids in XPLOR parameter and topology files. Make sure the atom names are consistent between the pdb files and the parameter and topology files. Look up atom naming conventions for common ligands in the literature.
Example Parameter and Topology Files
These were generated with the PRODRG server and extensively modified by John Cort
coa.par.txt: parameter file for coenzyme A
coa.top.txt: topology file for coenzyme A
coa.pdb: pdb file for coenzyme A with consistent atom names
Folinic Acid (5-formyltetrahydrofolate)
FON_top.txt: topology file for folinic acid
FON_par.txt: parameter file for folinic acid
FON_6S_H.pdb: pdb file for folinic acid with consistent atom names
Note that in the above examples, most dihedral parameters are commented out. Instead, improper angles are used to maintain planarity of sp2 hybridized carbons. As always, impropers are also used to maintain stereochemistry at all sp3 hybridized carbons.
Generating a .psf or .msf file
The next step is to generate a single .psf or .msf file for your protein and the ligand.
.psf file (XPLOR)
In the parameter and topology statements, read the ligand .par and .top files as well as those for the protein (e.g. parallhdg.pro, parallhdg.pro). When you write the .psf file, protein and ligand atoms will be there.
.mtf file (CNS)
The CNS scripts have a line telling you where to put ligand parameter and topology files.
Once you have the .psf or .msf file, you need to make a starting structure. Concatentate the protein and ligand pdb files, giving the ligand a unique residue number like 999. Try both extended and folded protein structures. Then add intermolecular distance restraints to your table and calculate structures of your protein-ligand complex.
HADDOCK can be used to calculate protein-ligand complex structures based only on chemical shift perturbation data alone. Optionally, distance restraints can be added. For this you need a structure of the ligand-free protein, then you dock the ligand to that structure assuming that the perturbed chemical shifts arise from proximity to the ligand. HADDOCK runs on top of CNS.