Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for a number of possible problems such as consistency to IUPAC residue/atom naming, chemical shifts that are widely outside the typical range for the particular atom/residue, and reports useful statistics information about the examined chemical shift set (e.g. percents assignments, number of stereospecifically assigned methyls, percents aromatic sidechain assignments, etc). AVS is run on every chemical shift set that is submitted to the BMRB, and is included as part of the Protein Structure Validation Suite (PSVS).
A number of version for the standalone AVS routine exist that are adapted for different bmrb versions (2.1 or 3.1). Two perl scripts can be run from any directory on any computer running perl by either pointing to the local AutoAssign script repository directory or by downloading the scripts linked below. Here an example script is provideded that generates the bmrb in 2.1 format directly from the sparky resonance list 'rl' and the protein sequences. The script validates and computes the completeness statistics for the generated chemical shift list. As modifications are made in the sparky project the operation is repeated until a final bmrb file is achieved.
/Local/AutoAssign1.14/bin/sparkyRL2bmrb.pl HsR50_bb.rl test_bmrb.bmrb 1 MSPIPLPVTDTDDAWRARIAA HRADKDEFLATHDQSPIPPADRGAFDGLRYFDIDASFRVAARYQPARDPEAVELETTRGPPAEYTRAAVLGFDLGDSHHTLTAFRVEGESSLF VPFTDETTDDGRTYEHGRYLDVDPAGADGGDEVALDFNLAYNPFCAYGGSFSCALPPADNHVPAAITAGERVDADLEHHHHHH -diasterio /Local/AutoAssign1.14/bin/missing_shifts.pl -printstats test_bmrb.bmrb > missing_HsR50_101109 /Local/AutoAssign1.14/bin/validate_assignments.pl test_bmrb.bmrb > vali_HsR50_101109 cp test_bmrb.bmrb HsR50_bb.bmrb rm test_bmrb.bmrb
Three scripts are run: 1) sparkyRL2bmrb.pl, 2) missing_shifts.pl, and 3) validate_assignments.pl. In addition, a bmrb parsing module BMRBParsing.pm is called that interprets the sequence in single letter code and returns numbering in the bmrb file, in this case starting from residue 1.
Newer file versions are available in later versions of the AutoAssign program that should handle bmrb 3.1 format.
The output interpretation is straightforward. A view of the output for res. 189-191 from the validation script is shown below, the summary of errors at the bottom of the file provides quick list of overall problems to the scientist:
D189 Overall: Consistent Typing: Consistent SRO: Consistent C Shifts: Consistent H Shifts: Consistent PRTL>> D 0.28 L 0.17 N 0.12 C 0.11 K 0.09 F 0.09 Y 0.08 R 0.01 HN Overlap>> D13 R126 C Shift Assignments>> C :: 176.670 CA :: 54.404 CB :: 40.809 H Shift Assignments>> H :: 8.458 HA :: 4.591 L190 Overall: Consistent Typing: Consistent SRO: Consistent C Shifts: Consistent H Shifts: Consistent PRTL>> L 0.22 K 0.2 D 0.14 R 0.12 C 0.1 F 0.09 Y 0.07 N 0.02 HN Overlap>> A20 C Shift Assignments>> C :: 177.898 CA :: 55.723 CB :: 41.968 H Shift Assignments>> H :: 8.115 HA :: 4.191 E191 Overall: Consistent Typing: Consistent SRO: Consistent C Shifts: Consistent H Shifts: Consistent PRTL>> E 0.14 H 0.13 W 0.13 R 0.13 Q 0.13 C 0.11 K 0.1 M 0.05 I 0.02 V 0.01 C Shift Assignments>> C :: 176.524 CA :: 57.003 CB :: 29.910 H Shift Assignments>> H :: 8.226 HA :: 4.107 Error Summary: G92 HA2 = 5.318(S), Expected = 3.95, Std = 0.4000, ChiSquare = 6.2621e-04 P116 HA = 5.681(S), Expected = 4.41, Std = 0.3600, ChiSquare = 4.1469e-04 R132 Typing: Mistyped R132 CB = 38.217(S), Expected = 30.66, Std = 1.7700, ChiSquare = 1.9592e-05 A160 HB = 0.090(S), Expected = 1.38, Std = 0.2500, ChiSquare = 2.4695e-07 T181 HA = 2.217(S), Expected = 4.48, Std = 0.5000, ChiSquare = 6.0111e-06
Several issues are flagged in the error summary for this entry, proton frequency out of range and CB for R132 out of range to indicate possible mis-assignment.
Below is a view of the missing_shift.pl script. For the protein in the example, backbone assignment only was conducted (unlisted atoms are present in the bmrb):
D189: HB2 HB3 L190: CD1 CD2 CG HB2 HB3 HD1 HD2 HG E191: CG HB2 HB3 HG2 HG3 AtomType Completeness Statistics: aromatic completeness :: 0 / 174 = 0.00% backbone completeness :: 845 / 965 = 87.56% sidechain completeness :: 227 / 1244 = 18.25% unambiguous CH2 completeness :: 0 / 20 = 0.00% unambiguous CH3 completeness :: 0 / 32 = 0.00% C :: 168 / 197 = 85.28% CA :: 181 / 197 = 91.88% CB :: 167 / 183 = 91.26% H :: 160 / 180 = 88.89% HA :: 156 / 183 = 85.25% HA2 :: 11 / 14 = 78.57% HA3 :: 9 / 14 = 64.29% HB :: 8 / 56 = 14.29%