Uppsala Software Factory

Uppsala Software Factory - LSQMAN Manual

1 LSQMAN - GENERAL INFORMATION
2 REFERENCES
3 VERSION HISTORY
4 START-UP MACRO
5 INTRODUCTION
6 QUICK 'N' DIRTY GETTING STARTED GUIDE
7 WHICH ALIGNMENT IS BETTER ?
8 WHAT'S THE DEAL WITH O'S "TRANSPOSE" MATRIX NOTATION ?
9 FEATURES

9.1 chains

9.2 zone definition

9.3 atom types

9.4 improving operators

9.5 O datablocks

9.6 O macros

9.7 independence

9.8 aligning multiple models

9.9 analysing multiple, NCS and NMR models

9.10 "ab initio" (brute force) alignment

9.11 O compatibility
10 INTERFACE
11 STARTUP
12 GENERAL COMMANDS

12.1 ? (list commands)

12.2 ! (ignored comment)

12.3 QUit (stop working with the program)

12.4 ECho (toggle command-line echo on/off)

12.5 #

12.6 $ (issue shell command)

12.7 @ (execute LSQMAN macro)

12.8 & (manipulate symbols)
13 I/O AND BOOK-KEEPING COMMANDS

13.1 REad (read molecule into memory)

13.2 WRite (write molecule to PDB file)

13.3 DElete (erase molecule from memory)

13.4 ANnotate (comment string for molecule)

13.5 LIst (information about molecule)

13.6 CHain_mode (naming of chains/models when read from PDB file)

13.7 TYpe_residues (list residues of molecule)

13.8 BFactor_range (exclude atoms with undesired temperature factors)

13.9 CEll (edit cell constants of molecule)

13.10 FRactionalise (Cartesian to fractional)

13.11 ORthogonalise (fractional to Cartesian)

13.12 NUcleic_acid_pdb_nomenclature (use PDB nucleic acid atom and residue names)

13.13 NOmenclature (check side-chain atom names)

13.14 FIx_atom_names (correct side-chain atom names)

13.15 NMr_model_mode (keep all or only first model when reading NMR ensemble)

13.16 HYdrogens (keep or strip when reading or writing)

13.17 AA_substitution_matrix

13.18 HEtatm (keep or strip when reading)

13.19 SUbtract_ave_b (subtract average temperature factor)

13.20 ATom_types (select atom types to use in superpositioning)

13.21 SEt (set parameters for operator-improvement algorithm)

13.22 OMacro (used in macros created by DEJAVU, SPASM, SPANA, etc.)

13.23 INvert_ncs (invert one or more RT operators)

13.24 ALter (manipulate chain and segment IDs)
14 SUPERIMPOSING AND COMPARING TWO MOLECULES

14.1 EXplicit (explicit superpositioning of two molecules)

14.2 BRute_force (find alignment of two molecules automagically)

14.3 FAst_force (find alignment of two molecules automagically)

14.4 NWunsch (sequence-based alignment of two structures)

14.5 XAlignment (apply external sequence alignment)

14.6 IMprove (improve alignment of two molecules)

14.7 DP_improve (dynamic-programming-based operator improvement)

14.8 GLobal_nw (global-superposition-distance-based Needleman-Wunsch sequence alignment)
15 MANIPULATING OPERATORS

15.1 EDit_operator (edit operator between two molecules)

15.2 SHow_operator (show operator between two molecules)

15.3 SAve_operator (write operator to O datablock file)

15.4 OLd_o_operator (read operator from O datablock file)

15.5 PErturb_operator (perturb an operator)

15.6 APply_operator (apply an operator to the coordinates of a molecule)
16 MISCELLANEOUS COMMANDS

16.1 GEt XYz (select residues near a point in space)

16.2 RMsd_calc (calculate RMSD for subset of atoms with current operator)

16.3 SOap (visualise structural differences)

16.4 JUdge (judge homology model by comparing with target and parent)

16.5 CAsp (assess sequence-identical residues of model and target)
17 PLOTS

17.1 PHipsi (make delta-Phi, delta-Psi plot)

17.2 ETa_theta_plot (make delta-Eta, delta-Theta plot for nucleic acids)

17.3 DIst_plot (plot distance between atoms in two molecules)

17.4 QDiff_dist_plot (difference-distance plot for rigid domain identification)

17.5 DDihe (delta-dihedral plots for backbone comparison)

17.6 DChi (list large side-chain torsion angle differences)

17.7 LEsk_plot (plot RMSD as a function of number of aligned residues)

17.8 SImilarity_plot (plot RMSD as a function of number of aligned residues)

17.9 WAters (compare solvent structure in two molecules)

17.10 HIsto_disto (histogram of distances)

17.11 D1_D2 (delta-1, delta-2 plot)
18 MORPHING

18.1 MOrph (morph transition between two conformational states)
19 SUPERIMPOSING AND COMPARING MULTIPLE MOLECULES

19.1 MCentral (find central NCS or NMR structure)

19.2 MAlign (align multiple NCS/NMR models)

19.3 MDihedral (multiple-model phi, psi analysis for NCS/NMR models)

19.4 MBfactors (compare temperature factors for NCS models)

19.5 MRamachandran (multiple-model Ramachandran plot for NCS/NMR models)

19.6 MSidechain (multiple-model chi1/chi2 analysis for NCS/NMR models)

19.7 MTorsion (multiple-model chi1, chi2 plot for NCS/NMR models)

19.8 MPlot (multi-RMS (distance) plot for multiple models)

19.9 VMain_chain (phi, psi circular variance plot for multiple models)

19.10 VSide_chain (chi1, chi2 circular variance plot for multiple models)
20 VRML COMMANDS

20.1 VRml SEtup (define some parameters)

20.2 VRml INit (open a new VRML file)

20.3 VRml COlour_list (list predefined colour names)

20.4 VRml ADd (add a molecule to the current VRML file)

20.5 VRml ALl_chains (add each chain/model of a molecule to the current VRML file)
21 EXAMPLE

21.1 (1) read the molecules

21.2 (2) do the initial, explicit superposition

21.3 (3) play with the improve option until you're happy

21.4 (4) save the operator (just in case), create an O macro file

21.5 (5) the O macro

21.6 (6) run O and execute the macro

21.7 (7) centre on one of the atoms and admire a beautiful fit !
22 IMPROVING OPERATORS

22.1 differences with Lsq_Improve in O

22.2 (1) defining which atoms to use

22.3 (2) sequentiality constraint

22.4 (3) no double use of residues

22.5 (4) optimisation criteria

22.6 (5) decaying parameters

22.7 (6) informative output
23 IMPROVING ROUGH DEJAVU ALIGNMENTS
24 LSQMAN AND MACROMOLECULES OTHER THAN PROTEINS
25 KNOWN BUGS

1 LSQMAN - GENERAL INFORMATION

Program : LSQMAN
Version : 081126
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : alignment and comparison of macromolecules
Package : DEJAVU

2 REFERENCES

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1994). Halloween ... Masks and Bones. In "From First Map to Final Model", edited by S. Bailey, R. Hubbard and D. Waller. SERC Daresbury Laboratory, Warrington, pp. 59-66. [http://xray.bmc.uu.se/gerard/papers/halloween.html]

* 2 * G.J. Kleywegt & T.A. Jones (1994). A super position. CCP4/ESF-EACBM Newsletter on Protein Crystallography 31, November 1994, pp. 9-14. [http://xray.bmc.uu.se/usf/factory_4.html]

* 3 * G.J. Kleywegt & T.A. Jones (1995). Where freedom is given, liberties are taken. Structure 3, 535-540. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8590014&dopt=Citation]

* 4 * G.J. Kleywegt (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst D52, 842-857. [http://scripts.iucr.org/cgi-bin/paper?gr0471]

* 5 * G.J. Kleywegt (1996). Making the most of your search model. CCP4/ESF-EACBM Newsletter on Protein Crystallography 32, June 1996, pp. 32-36. [http://xray.bmc.uu.se/usf/factory_6.html]

* 6 * G.J. Kleywegt & T.A. Jones (1996). Phi/Psi-chology: Ramachandran revisited. Structure 4, 1395-1400. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8994966&dopt=Citation]

* 7 * G.J. Kleywegt & T.A. Jones (1997). Detecting folding motifs and similarities in protein structures. Methods in Enzymology 277, 525-545.

* 8 * T.A. Jones & G.J. Kleywegt (1999). CASP3 comparative modelling evaluation. Proteins: Struct. Funct. Genet. Suppl. 3, 30-46. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10526350&dopt=Citation] [http://xray.bmc.uu.se/casp3]

* 9 * G.J. Kleywegt (1999). Experimental assessment of differences between related protein crystal structures. Acta Cryst. D55, 1878-1857. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10531486&dopt=Citation] [http://scripts.iucr.org/cgi-bin/paper?se0283]

* 10 * Y.W. Chen, E.J. Dodson & G.J. Kleywegt (2000). Does NMR mean "Not for Molecular Replacement" ? Using NMR-based search models to solve protein crystal structures. Structure 8, R213-R220. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=11080645&dopt=Citation]

* 11 * D. Madsen & G.J. Kleywegt (2002). Interactive motif and fold recognition in protein structures. J. Appl. Cryst. 35, 137-139. [http://scripts.iucr.org/cgi-bin/paper?wt0007]

* 12 * M. Novotny, D. Madsen & G.J. Kleywegt (2004). An evaluation of protein-fold-comparison servers. Proteins, 54, 260-270 (2004). [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=14696188&dopt=Citation]

* 13 * Kleywegt, G.J., Zou, J.Y., Kjeldgaard, M. & Jones, T.A. (2001). Around O. In: "International Tables for Crystallography, Vol. F. Crystallography of Biological Macromolecules" (Rossmann, M.G. & Arnold, E., Editors). Chapter 17.1, pp. 353-356, 366-367. Dordrecht: Kluwer Academic Publishers, The Netherlands.

3 VERSION HISTORY

931007 - 0.1 - initial version (READ, WRITE, QUIT, DELETE, ANNOTATE, LIST, ATOM_TYPES)
931008 - 0.2 - second version (EXPLICIT; IMPROVE without checking for fragment size so far)
931021 - 0.3 - continued (implemented minimum fragment length, various optimisation criteria, maximum number of optimisation cycles); works well !
931022 - 0.4 - continued (implemented sequential_hits_only option, rms-weight, fragment length decay, show-operator, edit-operator, save_operator, old-o-operator, OMACRO commands)
931023 - 1.0 - first production version (removed some minor bugs, wrote manual)
931027 - 1.1 - removed minor bugs; implemented on ESV and ALPHA; minor corrections to the manual; added OMacro WRite option
931103 - 1.2 - removed some bugs
931121 - 1.3 - implemented ATom_types ALl and NOn-hydrogen
931124 - 1.4 - open all PDB files with READONLY
931129 - 1.5 - added CHain_mode and TYpe_residues
931130 - 1.6 - pre-cooked SEt options; debugged use of empty chain identifiers (i.e., chain-id = space); added .lsq_stats_x_y datablock to O macro output
931206 -1.6.1- minor extension of allowed zone designators; proper clean-up when a molecule is DEleted
940323 - 1.7 - store XPLOR segment IDs; implement APply option
940519 - 1.8 - removed nasty bug from IMprove option (when using zones instead of wildcards)
940524 -1.8.1- removed another nasty bug (showed up on ALPHAs whenever the C-terminal residue in "mol 2" showed up in an alignment in the IMprove option)
940525 -1.8.2- use standard routine to print and analyse RT operators

(* code changes of intermediate versions lost due to disk crash *)
> 940901 -1.9.0- calculate RMS delta-B in case of EXplicit LSQ
> 940906 - 2.0 - implemented command RMsd_current_operator
> 940908 -2.0.1- SHow command prints a comment in case of NCS regarding
> the quality of the refinement (coordinates & B-factors)

941021 - 3.0 - calculate RMS delta-B for EXplicit and IMprove commands; remove bug which made that the first matched residue after IMprove was never listed; add comment w.r.t. con/restraints on position and Bs in case of NCS (i.e., SHow mol1 mol1); implemented RMsd_calc command; implemented PHipsi command
941218 -3.0.1- added more statistics to PHipsi command
941223 - 3.1 - added DIstance and delta-dihedral (DD) plots
941230 -3.1.1- DD plots now also contain |delta(X-X-X angle)| curve
950224 - 3.2 - new option to compare WAters in different models or chains
950331 -3.2.1- long-standing bug in superpositioning with ALL and NONH atom types fixed (I think); add O datablock header lines to plot files
950412 -3.2.2- add B-factor cut-offs to EXplicit and RMsd commands (set with command BFactor_range)
950413 - 3.3 - cell constants read from CRYST1 card; CEll command to set or alter cell constants; command ORthogonalise and FRactionalise to carry out superpositioning in fractional space (may help to detect spacegroup errors)
950528 - 4.0 - removed bug from ORthog and FRact commands (always used the CEll parameters of the first molecule); new options for multiple (NCS or NMR) alignment: MCentral to find the central chain/model; MAlign to align all chains/ models to one reference chain or model; MDihedral to analyse PHI and PSI angle distributions (and to plot SIGMA(phi) and SIGMA(psi) as a function of residue); MBfactors to analyse B-factors of a particular atom type (e.g., CA atoms) and to plot SIGMA(B) and RANGE(B) as a function of residue number.
950529 - 4.1 - minor bug fixes; CHain-mode BReak to delineate chains/ models by breaks in the subsequent numbering of residues, and CHain-mode LOwer which uses a drop in residue number between two subsequent residues to delineate chains; new option MRamachandran to produce a plot for multiple models/chains; MSidechains to analyse the distribution of CHI1 and CHI2 angles (and plot SIGMA(chi1) and SIGMA(chi2) as a function of residue number); MTorsions to produce a multiple CHI1/CHI2 plot; new option NOmenclature to enforce proper names for side-chain atoms of PHE, TYR, ASP, GLU and ARG residues (important for comparisons involving these atoms)
950530 -4.1.1- added option to produce MRama and MTor plots in a polar coordinate frame (add "P" as the last parameter)
950616 - 4.2 - improved multiple Ramachandran (MR) and multiple chi1/chi2 (MT) plots (no centroid if only two molecules; no more long lines across the plot). New HYdrogen command to keep or strip hydrogen atoms on reading/writing of PDB files (NOTE: the default behaviour of the program is now to STRIP them, since usually one is not interested in them and they slow down some parts of the program). New SUbtract_ave_b command to subtract the average chain B-factor in order to get meaningful RMS delta-B values and multiple-model B-factor plots (MB).
950705 -4.2.1- minor bug fix for connecting residues in multiple Ramachandran (MR) or side-chain torsion (MT) plots.
950830 -4.2.2- calc Maiorov-Crippen "rho" (not the scaled one) for EXplicit and IMproved superpositionings (use the SHow command to see the actual values). Reference: Proteins 22, pp. 273-283 (note: equation (16), the definition of rho, contains an error: R^2(B) should be 2*R^2(B)).
950913 - 4.3 - added D1/D2 plots
951031 -4.3.1- calc angle in RMsd command
960409 - 4.4 - implemented macro facility
960415 -4.4.1- minor bug fixes
960417 -4.4.2- minor bug fixes
960508 - 4.5 - new HIsto_disto command
960517 - 4.6 - implemented simple symbol mechanism
960710 -4.6.1- print average RMSD between chains in MCentral command
960729 - 4.7 - implemented FIx_atom_names command; fixed a long-standing bug in the EXplicit command if all (non-H) atoms were compared !!!
960801 -4.7.1- option MRama now uses our new definition of core regions in the Ramachandran plot
960804 -4.7.2- average dihedrals (phi, psi, chi1, chi2) properly, i.e., use <DIHE> = RTODEG * ATAN2 ( <SIN>, <COS> )
960821 -4.7.3- PHipsi command now also prints the correlation coefficient between the PHI angles of both chains and between the PSI angles of both chains
970127 - 4.8 - fixed two terrible bugs in the MAlign option (thanks to Tim Allison)
970131 - 5.0 - implemented BRute_force alignment option
970210 - 5.1 - fixed two more bugs in the DIstance and DDihedral plot commands, so that these commands now also work correctly for macromolecules other than proteins (see new example for DNA in the manual; thanks to Armin Maeder); improved BRute_force command a trifle; softened judgment of NCS restraint quality a tad; implemented HEtatm command
970221 - 5.2 - implemented ATom type SI(de_chain) which is any type except N, CA, C and O, OT1 etc. (i.e.: this includes hydrogen atoms if they have been read in !); also implemented ATom_type PH(osphorous) for DNA and RNA work; error traps if certain options are used with inappropriate atom types (e.g., IMprove with ALL, NONH or SIDE)
970505 - 5.3 - added optional "chain" parameter to the APply command
970626 - 5.4 - support initialisation macro (setenv GKLSQMAN macrofile)
970630 -5.4.1- removed small bugs which under exceptional circumstances led to wrong results for ALL atoms, NONHydrogens and SIDEchain atoms
970707 -5.4.2- improved statistics summary for MRama, MDihe, MSide and MTors commands
970722 - 5.5 - implemented VM and VS commands to plot the circular variance of phi,psi and chi1,chi2 for multiple models
970722 - 6.0 - new VRml commands !
970808 - 6.1 - default for SEquential hits is now ON; added frameshift correction to IMprove algorithm which is ON by default (toggle with SEt SHift); change convergence test in IMprove so that "no improvement" is used instead of "fit deteriorated" (this should speed up the BRute_force command slightly)
970827 -6.1.1- new optional chain_id parameter for the WRite command to enable writing of just a single chain or model (default = * = all chains/models); new VRml ALl_chains command to write VRML instructions for all chains/models of a molecule, each in a different colour
971111 -6.1.2- in the EXplicit command, a single residue may now be given as e.g. "A54" instead of "A54-54"
980901 - 6.2 - new INvert_ncs command to invert one or more O-style RT-operators (Cartesian space only)
981019 - 6.3 - new JUdge command to check how good a homology model is compared to both its TARGET and the PARENT structure from which it was (or could have been) derived
981021 -6.3.1- new ECho command to echo command-line input (useful in scripts)
981022 - 6.4 - implemented command history (# command)
981030 - 6.5 - new MOrph command to morph the transition between two conformational states (to make movies) - COOL !!!
981101 -6.5.1- continued with MOrph command
981102 -6.5.2- continued with MOrph command
981102 -6.5.3- continued with MOrph command
981103 - 6.6 - new ATom_types TRace command (selects CA atoms plus all non-hydrogen side chain atoms); changed ATom_type SIde_chain to exclude hydrogen atoms; implemented MOrphing using CA atoms plus all non-hydrogen side-chain atoms (using ATom_type TRace)
981104 -6.6.1- continued with MOrph command
981105 -6.6.2- continued with MOrph command
981106 - 7.0 - touched up MOrph command for general release
981108 -7.0.1- implemented SImilarity_plot command; extended functionality of the JUdge command
981111 -7.0.2- print histogram(s) for some of the plot commands (PHipsi, DIstance, DDihehral, and D1_D2)
981117 -7.0.3- DIstance_plot now also includes residues from mol1 that were not found in mol2 (distance plotted at a negative value)
981119 -7.0.4- trap when no atoms found in input PDB file; print D-values after IMprove
981123 -7.0.5- changed definition of D-value to %Matched(i)*%SeqID(i)/10000
981126 -7.0.6- skip alternative conformations when reading PDB files
981207 - 7.1 - new CAsp command to assess RMS distances and number of matching residues between sequence-identical residues as a function of distance cut-off
990119 -7.1.1- minor changes to CAsp command
990120 -7.1.2- added extra optional parameter to BRute_force command to speed up the calculations if the two molecules are different models of the same protein (i.e., same residue numbering)
990301 -7.1.3- echo some PDB header lines when reading a PDB file
990823 -7.1.4- new QDiff_dist_plot command to plot difference-distance matrices
990923 -7.1.5- minor changes
991110 -7.1.6- MOrph command now also generates an O macro that in turn will create a big O plot file (for later rendering)
991119 - 7.2 - MOrph command improved such that internal coordinate morphing with TRAC atom type works much better; it also works for hetero-entities provided you take some precautions (same atom names, same residue number, at least one atom called " CA ", etc.)
991122 -7.2.1- minor changes
991221 - 7.3 - several bug fixes for linux/g77
000630 -7.3.1- changed the default CA-CA distance cut-off from 3.8 A to 3.5 A (you can still change it with the SEt DIst command, of course)
001122 - 7.4 - added optional "first residue" and "last residue" parameters to the WRite command, so you can selectively write a stretch of residues; ditto for the APply command; added JUdge and CAsp commands to the menu (used during CASP3 evaluation); new PErturb_operator command (to see how stable an operator is); added an optional "cutoff" parameter to the commands MDihedral, MRamachandran, MSide, MTors, VMain, VSide and MBfactors so you can list only the residues whose NCS-mates show the largest spread in torsion angles, circular variance, or B-factors
001206 - 7.5 - new set of ALter commands to manipulate chain and segment IDs without having to go through MOLEMAN(2) (or to use sed or to edit PDB files)
001213 - 7.6 - speeded up BRute_force command a little bit; implemented some commands to make it easier to use LSQMAN with nucleic acids: ATom_types C4*, ATom_types NUcleic_acid_backbone, NUcleic_acid_pdb_nomenclature, SEt NUcleic_acid_defaults, and OMacro DEfine
001229 -7.6.1- new command FAst_force, a quicker (but dirtier) variant of the BRute_force command for ab initio superpositioning of two structures (only using the first of the user-selected atom types, e.g. CA or C4*)
001229 -7.6.2- huh ? undocumented changes ?
010104 -7.6.3- removed bug in APply command (using chain id '*' did not work as intended; thanks to Aaron Chandler for persevering ;-); also, the operators of the moved molecule are only reset if the entire molecule was moved
010118 - 7.7 - the PHipsi_plot, DIstance_plot, DDihe_plot and D1_D2_plot commands now all have the parameters: mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]. The cut-off is used to print all residue pairs for which delta-phi or delta-psi etc. exceeds the cut-off value (so you can find out which residues show the largest differences; use a negative value for cut-off to suppress printing). The hist_bin and hist_max parameters are used for the histograms of delta-phi etc. values; new ALter REnumber command to renumber residues in a certain chain sequentially; new DChi command to list large side-chain torsion angle differences between two similar models
010126 -7.7.1- minor changes (add "pdb" to sam_at_in commands in O macros to handle case of filenames with .ent); include sketch_stick objects in OM macros and do centre_xyz on centre of first molecule
010316 - 7.8 - implemented LEsk_plot command
010326 -7.8.1- minor changes
010410 - 8.0 - NWunsch command to do a quick sequence-based alignment that will be applied to the structure; GLobal_nw command to obtain a structure-based sequence alignment based on the current operator
010418 - 8.1 - MPlot command to generate multi-RMS (distance) plots
010525 - 8.2 - DP_improve command as an alternative way of improving operators; calculate significance of a structural alignment using the Levitt-Gerstein method (in the GLobal command)
010611 -8.2.1- Correct count of number of gaps in Levitt-Gerstein method (namely: the sum of the nr of gaps in the two sequences, excluding terminal gaps)
010726 - 8.3 - GEt XYz command to quickly superimpose residues near a certain point in space
010727 - 8.4 - SOap_film command to visualise structural differences
010730 -8.4.1- the IMprove and GLobal commands now print a load of statistics about the distribution of the distances between the matched atoms
010803 - X - added some example figures to demonstrate some of the options that produce plots
010906 - 8.5 - the EXplicit, IMprove, RMsd, and DP_improve commands now calculate and print the relative RMSD as defined by MR Betancourt & J Skolnick (Biopolymers 59, pp. 305-309 (2001)) - identical structures have an RRMSD of zero, a value around one means that two structures are as different as two random proteins of the same sizes; two new optimisation criteria in the IMprove option: the CRippen statistic and the RRmsd; RRMSD is now also stored for every pair of structures
011012 - 8.6 - the MPlot command has been extended to also produce a "CD plot" (grey-scale mapping of pair-wise distances; see Jones, T.A. and Kleywegt, G.J. (1999). CASP3 comparative modelling evaluation. Proteins: Struct. Funct. Genet. Suppl. 3, 30-46)
011012 -8.6.1- in the MPlot CD plot, show areas with missing residues in pink
011022 -8.6.2- increased the maximum number of steps in a MOrph to 999
011023 - 8.7 - buffer size for 2D plots can now be passed through the environment variable or command-line argument GKBUFFER (e.g.: run lsqman gkbuffer 1000000); otherwise, the default is 500000 points; this affects the QD command, for instance
011024 - 8.8 - removed a terrible bug from the MOrph code when you use Internal coordinate morphing with the TRACe quasi atom type (thanks to Jinghua Tang for for informing me !)
011120 - 8.9 - implemented NMr_model_mode command to decide if all models or only the first model of an NMR ensemble should be read; changed the default values for some of the SEt commands (e.g., most settings now use the Maiorov-Crippen RHO as the optimisation criterion for operator IMprovement)
011121 -8.9.1- added two more buffers (size controlled by the user through GKBUFFER) so that the following commands can be given enough memory: LO, GL, DP, SO, NW, QD; the default value of the buffer size was changed to 1000000
011122 - 9.0 - BRute_force now ignores residues with negative or zero residue number (rather than simply failing); OMacro APpend also writes Crippen RHO and relative RMSD to the lsq_stats_* datablocks; the SOap_film command now has an optional 'verbose' parameter (default value is 'no' to reduce output); the DP_improve command has an extra 'max_cycles' parameter to enable iterative use until convergence of the superposition; improved recipe in the quick'n'dirty getting-started guide
011123 -9.0.1- minor changes
011213 -9.0.2- minor changes
020201 -9.0.3- added option to GEt XYz command to create an O macro that draws a 'zone' object of the selected residues
020207 -9.0.4- removed nasty bug from the calculation of the Levitt-Gerstein statistics (GLobal command, Z-score and P (z > Z) were affected; thanks to Mike Sierk for pointing out the bug)
020208 - 9.1 - implemented calculation of the normalised RMSD (100) [O Carugo & S Pongor, Protein Sci 10, 1470-3 (2001)]. This will be listed with the SHow command and can be used as the optimisation criterion in the IMprove command (SEt OPtim NR)
020219 - 9.2 - new CHain_mode option NOn-blank (keeps original chain names, but replaces blanks by _underscores_); chains may now have names other than A-Z (but not 0-9 !)
020222 -9.2.1- added optional log_file parameter to the GLobal command that allows you to save the structure-based sequence alignment plus some key statistics to a log file
020225 -9.2.2- minor bug fix
020306 -9.2.3- in the BRute_force and FAst_force commands, the min number of matched residues may now also be entered as a fraction. For instance, if you supply a value of 0.9, then the algorithm will finish as soon as at least 90% of the residues of the smallest protein have been matched to residues in the other protein
020312 -9.2.4- in the GLobal command, evaluate the Levitt-Gerstein P-value in double precision (thanks once again to Mike Sierk for noticing the problem)
020402 -9.2.5- minor bug fix
020610 -9.2.6- minor bug fix
020925 - 9.3 - IMprove command: single chain names may now be used and are interpreted to mean the entire chains (e.g.: IMprove m1 a m2 c); REad command: optional parameters chain and atom to read just a single chain (default * = all ) and/or a single type of atom (default * = all), for example: read m1 pdb1pmp.ent c " ca " will only read the CA atoms of chain C from file pdb1pmp.ent
021121 -9.3.1- fixed a bug that prevented the MR, MD and VM commands from working ...
021126 -9.3.2- all rotation matrices are now printed out in the "normal" (non-O-ish) way
021206 -9.3.3- new AA_substitution_matrix command with which you can read in an SBIN-style substitution matrix (will be used by the NWunsch command)
021220 - 9.4 - several minor changes to the NW, DP, and LO commands; new XAlignment command to read in an external sequence alignment and apply it to two structures
030402 -9.4.1- minor bug fix in MCentral command
030404 -9.4.2- GLobal command now also prints residue numbers in the initial structure-based sequence alignment
030904 -9.4.3- FAst and BRute commands now print the number of trials as well as the number of those that were subjected to operator IMprovement (IMprovement is carried out only for those trials that yield an RMSD less than 10 Å)
030918 -9.4.4- MCentral and MAlign commands are now more tolerant to differences in the lengths of the chains that are superimposed (the number of atoms may differ by 25% instead of merely 5% in the past)
031110 -9.4.5- minor bug fix in GLobal command
031204 -9.4.6- increased dimensioning to 200,000 atoms and 30,000 residues (for the ribosome folks)
040225 -9.4.7- whenever an operator is shown, the value of RMSD/Nalign is also printed (see: Sierk & Pearson, Prot Sci 13, 773-785 (2004))
040302 -9.4.8- made O macros produced by SAVANA compatible with current O (i.e., they read stereo_chem.odb and they use the pdb_read command instead of sam_at_in)
040701 -9.4.9- changed checks of dynamic memory allocation to allow for pointers with negative values as returned by some recent Linux versions
041001 - 9.5 - replaced Kabsch' routine U3BEST by quaternion-based routine (U3QION) to do least-squares superpositioning
041014 -9.5.1- RMsd command now also prints some statistics about the difference-distance matrix; some minor changes
050121 -9.5.2- minor changes
050218 - 9.6 - implemented geometric SAS(n) scores (n=1..4; can also be used as optimisation criteria when improving alignments with the IMprove or DP_improve commands; see Kolodny et al., J Mol Biol 346, 1173-1188 (2005))
050427 -9.6.1- the RMsd command can now also print the TM-score provided you supply the value of Ltarget (see Zhang and Skolnick, Nucl Acids Res 33, 2302-2309 (2005))
050428 -9.6.2- the GLobal_nw command now also prints the TM-score
050920 - 9.7 - new command ETa_theta_plot to analyse differences between two nucleic acid structures
060721 -9.7.1- DNA and RNA nucleotides are now also translated to the one-letter code (but in lower case) for the GLobal command so you get a proper structure-based sequence alignment (of course, some of the output statistics only apply to proteins, but you can ignore those)
060801 -9.7.2- changes to the BRute_force and FAst_force commands to speed them up (new slide step size parameter); additional output for the FAst_force option to help you assess how much longer a comparison of (biiig) molecules is going to take
060802 -9.7.3- GLobal command: the detailed, residue-by-residue dynamic programming structural alignment is no longer printed to the screen when a log file is used (to prevent clutter with long alignments)
070301 -9.7.4- RMsd command: fixed a couple of nasty bugs (although they're unlikely to have bothered anyone)
070504 -9.7.5- GLobal command: if this is used without the DP command having been used previously, the two reported RMSD values may differ. An explanation is now printed to explain why this is the case
070706 -9.7.6- MCentral command: extra optional parameter to write all operators to an O-style .LSQ_RT operator file
070913 -9.7.7- NUcleic_acid_pdb_nomenclature changed so it recognises post-PDB-remediation deoxy-RN names (DA etc.). However, it still converts quotes to asterisks as the 4th character of an atom name. If you want to use new style PDB files of DNA/RNA, use this command first
071128 -9.7.8- minor bug fix
081126 -9.7.9- minor bug fix in DD command

4 START-UP MACRO

From version 5.4 on, LSQMAN can execute a macro at start-up (whether it is run interactively or in batch mode). This can be used to execute commands which you (almost) always want to have executed. To use this feature, set the environment variable GKLSQMAN to point to a LSQMAN macro file, e.g.:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 setenv GKLSQMAN /home/gerard/lsqman.init
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

5 INTRODUCTION

LSQMAN is a program for performing least-squares superpositioning of biomacromolecules. The program offers a superset of the LSQ- functionality inside O and removes some of the limitations and irritations of the LSQ-commands.

The "heart" of the program is Kabsch's subroutine U3BEST; see the following references:
W.KABSCH ACTA CRYST.(1976).A32,922-923
W.KABSCH ACTA CRYST.(1978).A34,827-828

Phi/Psi difference plots are discussed in: AP Korn & DR Rose, Prot. Engineering 7(8), 961-967 (1994)

6 QUICK 'N' DIRTY GETTING STARTED GUIDE

If you want to use LSQMAN to superimpose two structures and to obtain a structure-based sequence alignment, and if you don't want to learn the ins and outs of the program, just follow this recipe (we will use 1CEL and 2AYH as a non-trivial example).

Note that a (slightly more elaborate) version of this recipe is also available as a ready-to-run LSQMAN macro from the OMAC repository (in a file called "align.lsqmac"). This would typically be used by crystallographers. If you are a bioinformatician, you may be interested in longer alignments at the expense of poorer RMSDs - in that case, use another LSQMAN macro from the OMAC repository, "align_long.lsqmac".

- start the program and read the two structures:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 pdb1cel.ent
 [...]
 LSQMAN > re m2 pdb2ayh.ent
 [...]
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- do a fast brute-force structural imposition using coarse-fit parameter settings (assuming you are interested in chain A of both molecules; note that chains are renamed A, B, ... by LSQMAN unless you tell it otherwise with the CHain_mode command):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set coarse
 Setting coarse 6 A fit defaults
 LSQMAN > fast m1 a m2 a 50 25 1000
 Fast-force fit of  M1 A
 And                M2 A
 Atom type      | CA |
 Fragment length            50
 Fragment step size         25
 Min matched residues     1000
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)
 Max match so far : (         69)
 RMSD (A)         : (   3.589)
 Max match so far : (        171)
 RMSD (A)         : (   3.796)
   
 Max match : (        171)
 RMSD (A)  : (   3.796)
   
 Regenerating best alignment ...
 The    171 atoms have an RMS distance of    3.796 A
 SI = RMS * Nmin / Nmatch             =      4.75095
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.27604
 CR = Maiorov-Crippen RHO (0-2)       =      0.24889
 RR = relative RMSD                   =      0.23055
 RMS delta B for matched atoms        =     5.888 A2
 Corr. coefficient matched atom Bs    =        0.466
 Rotation     :  -0.01142788  0.73876214  0.67386937
                 -0.99739343  0.03959398 -0.06032123
                 -0.07124421 -0.67280221  0.73638403
 Translation  :      48.3822     53.5505     45.8239
 CPU total/user/sys :       5.6       5.6       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- improve the superimposition operator with intermediate-fit or default parameter settings:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set reset
 Resetting program defaults
 LSQMAN > im m1 * m2 *
 Improve fit of  M1 *
 And             M2 *
 Atom type      | CA |
 Nr of atoms in mol1 : (        868)
 Nr of atoms in mol2 : (        214)
   
 Found fragment of length : (       5)
 Found fragment of length : (       4)
 [...]
          LYS-A 422 <===> LYS-A 210 @     0.77 A *
          PHE-A 423 <===> TYR-A 211 @     0.81 A
          GLY-A 424 <===> THR-A 212 @     0.71 A
   
 Nr of residues in mol1   : (     868)
 Nr of residues in mol2   : (     214)
 Nr of matched residues   : (     126)
 Nr of identical residues : (      18)
 % identical of matched   : (  14.286)
 % matched   of mol1      : (  14.516)
 % identical of mol1      : (   2.074)
 D-value    for mol1      : (   0.003)
 % matched   of mol2      : (  58.879)
 % identical of mol2      : (   8.411)
 D-value    for mol2      : (   0.050)
   
 Analysis of distance distribution:
 Number of distances                    :        126
 Average (A)                            :       1.41
 Standard deviation (A)                 :       0.76
 Variance (A**2)                        :       0.57
 Minimum (A)                            :       0.25
 Maximum (A)                            :       3.22
 Range (A)                              :       2.98
 Sum (A)                                :     178.03
 Root-mean-square (A)                   :       1.60
 Harmonic average (A)                   :       1.00
 Median (A)                             :       1.32
 25th Percentile (A)                    :       0.78
 75th Percentile (A)                    :       1.91
 Semi-interquartile range (A)           :       1.13
 Trimean (A)                            :       1.33
 50% Trimmed mean (A)                   :       1.30
 10th Percentile (A)                    :       0.53
 90th Percentile (A)                    :       2.50
 20% Trimmed mean (A)                   :       1.34
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- improve the operator with (max) 10 cycles of DP_improve:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > dp m1 a m2 a sq 3.5 10
 Dynamic-Programming-based operator improvement (Needleman-Wunsch)
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Cut-off distance     3.50
 Matrix mode      SQ
 Max nr of cycles       10
 Verbose output   NO
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)
   
 DP_improve iteration : (          1)
 [...]
 DP_improve iteration : (          4)
 Calculating squared distance matrix ...
   
 Executing Needleman-Wunsch ...
   
 Gap penalty         : (   6.125)
 Raw alignment score : ( -2.506E+03)
 Length sequence 1   : (     434)
 Length sequence 2   : (     214)
 Alignment length    : (     492)
 Nr of identities    : (      17)
 Perc identities     : (   7.944)
 Nr of matched res   : (     156)
 RMSD for those (A)  : (   1.696)
   
 The    156 atoms have an RMS distance of    1.696 A
 SI = RMS * Nmin / Nmatch             =      2.32599
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.27090
 CR = Maiorov-Crippen RHO (0-2)       =      0.11294
 Estimated RMSD for 2 random proteins =     16.062 A
 RR = Relative RMSD                   =      0.10557
 Rotation     :   0.00677837  0.80088258  0.59878302
                 -0.99593049  0.05922168 -0.06793585
                 -0.08986957 -0.59588575  0.79802483
 Translation  :      47.4868     52.0458     47.7760
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- get the global sequence alignment based on the current superimposition operator. Note how nicely the catalytic residues E-x(1)-D-x(1,2)-E align. Also note that the Levitt-Gerstein statistics at the bottom suggest that this is a very significant structural similarity.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > gl m1 a m2 a 3.5
 Global-superposition-distance-based Needleman-Wunsch alignment
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Cut-off distance     3.50
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)
   
 Applying current operator to mol 2 : (   0.007    0.801    0.599   -0.996
     0.059   -0.068   -0.090   -0.596    0.798   47.487   52.046   47.776)
   
 Calculating superposition-distance matrix ...
   
 Executing Needleman-Wunsch ...
   
      1 -   Q  DIST =        -
      2 -   T  DIST =        -
 [...]
    253 C   -  DIST =        -
    254 C   W  DIST =     2.34 A
    255 S   D  DIST =     2.17 A
    256 E | E  DIST =     2.10 A
    257 M   I  DIST =     1.94 A
    258 D | D  DIST =     1.71 A
    259 I   -  DIST =        -
    260 W   I  DIST =     1.17 A
    261 E | E  DIST =     1.11 A
    262 A   F  DIST =     0.65 A
    263 -   L  DIST =        -
    264 N   G  DIST =     3.04 A
    265 -   K  DIST =        -
 [...]
    491 S   -  DIST =        -
    492 G   -  DIST =        -
   
 Sequence 1 ------?SACTLQSETHPPLTWQ---------KCSSGGTCTQQTGSVVI--DAN------
    |=ID
 Sequence 2 QTGGSF----------------FEPFNSYNSG------------TWEKADG--YSNGGVF
   
 Sequence 1 ------WRWTHATNSSTNCYDGNTWSSTLCPDNETCAKNCCLDGAAYASTYGVTT---SG
    |=ID                                                        |
 Sequence 2 NCTWRA----------------------------------------N----NVNFTNDG-
   
 Sequence 1 NSLSIGFV-TQSAQK-----NVGARLYLMASDTTYQEFTLLGNEFSFDVDVSQLPCGLNG
    |=ID      |  |                 |                                |
 Sequence 2 -KLKLGLTS-----SAYNKF-DCAEYRS------TNIYG-Y-GLYEVSMKP-AKNTGIVS
   
 Sequence 1 ALYFVS---M---DADGGVSKYPTNTAGAKYGTGYCDSQCPRDLKFINGQANVEGWEPSS
    |=ID
 Sequence 2 SFFTYTGPAHGTQ-----------------------------------------------
   
 Sequence 1 NNANTGIGGHGSCCSEMDIWEA-N-SI--SEALTPHPCTTVGQEICEGDGCGGTYSDNRY
    |=ID                   | |  |
 Sequence 2 -------------WDEID-IEFLGK-DTTKVQFNYYTN----------------------
   
 Sequence 1 GGTCDPDG-CDWNPYRLGNTSFYGPGSSFTLD-T-TKKLTVVTQFETSGAINRYYV-QNG
    |=ID           |                               |          |  |
 Sequence 2 ---GV--GGHEKVI-------SL------G-FDASKGFHTYAFDWQPG-YIKWYVDG---
   
 Sequence 1 VTFQQ-PNAELGSYSGNELNDDYCTAEEAEFGGSSFSDKGGL-TQFKKATSGGMVLVMSL
    |=ID                                                             | |
 Sequence 2 --VLKH-----------TATA--------------------NI--P-ST---PGKIMMNL
   
 Sequence 1 WDDYYANMLW--LDSTYPTNETSSTPGAVRGSCSTSSGVPAQVESQSPNAKVTFSNIKFG
    |=ID    |        |                                               |
 Sequence 2 WNGTGVD-DWLG-------------------------SY--N-G--ANPLYAEYDWVKYT
   
 Sequence 1 --PIGSTGNPSG
    |=ID
 Sequence 2 SN----------
   
 Analysis of distance distribution:
 Number of distances                    :        156
 Average (A)                            :       1.51
 Standard deviation (A)                 :       0.77
 Variance (A**2)                        :       0.59
 Minimum (A)                            :       0.15
 Maximum (A)                            :       3.42
 Range (A)                              :       3.27
 Sum (A)                                :     235.95
 Root-mean-square (A)                   :       1.70
 Harmonic average (A)                   :       1.06
 Median (A)                             :       1.50
 25th Percentile (A)                    :       0.87
 75th Percentile (A)                    :       1.98
 Semi-interquartile range (A)           :       1.10
 Trimean (A)                            :       1.46
 50% Trimmed mean (A)                   :       1.44
 10th Percentile (A)                    :       0.56
 90th Percentile (A)                    :       2.61
 20% Trimmed mean (A)                   :       1.45
   
 Gap penalty         : (   6.125)
 Raw alignment score : ( -2.506E+03)
 Length sequence 1   : (     434)
 Length sequence 2   : (     214)
 Alignment length    : (     492)
 Nr of identities    : (      17)
 Perc identities     : (   7.944)
 Nr of matched res   : (     156)
 RMSD (A) for those  : (   1.696)
   
 Levitt-Gerstein statistics:
 Nr of gaps       : (         38)
 Similarity score : (  2.442E+03)
 Z-score          : (  1.787E+01)
 P (z > Z)        : (  0.000E+00)
 P (z > Z) is the probability of matching any two
 random structures and finding a Z-score z which
 is greater than the Z-score Z of the current pair.
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- inspect the final superimposition operator:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > sh m1 m2 Operator bringing : (M2) on top of : (M1) Last command was : (IM M1 * M2 *) The 156 atoms have an RMS distance of 1.696 A SI = RMS * Nmin / Nmatch = 2.32599 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.27090 CR = Maiorov-Crippen RHO (0-2) = 0.11294 RR = relative RMSD = 0.10557 RMS delta B for matched atoms = 1000.000 A2 Corr. coefficient matched atom Bs = 1000.000 Rotation : 0.00677837 0.80088258 0.59878302 -0.99593049 0.05922168 -0.06793585 -0.08986957 -0.59588575 0.79802483 Translation : 47.4868 52.0458 47.7760 Nr of RT operators : 1

RT-OP 1 = 0.0067784 -0.9959305 -0.0898696 47.487 0.8008826 0.0592217 -0.5958858 52.046 0.5987830 -0.0679358 0.7980248 47.776 Determinant of rotation matrix 1.000000 Column-vector products (12,13,23) 0.000000 0.000000 0.000000 Crowther Alpha Beta Gamma 81.423 -37.058 6.473 Spherical polars Omega Phi Chi 25.777 -52.525 93.898 Direction cosines of rotation axis 0.264587 -0.345125 0.900490 X-PLOR polars Phi Psi Kappa 106.374 69.811 93.898 Lattmann Theta+ Theta2 Theta- -87.896 37.058 254.951 Rotation angle 93.898 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- apply the operator, save the superimposed molecule to a file, and create a VRML file if you like:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > apply m1 m2
 Bring Mol 2 on top of Mol 1 ...
 Molecule 1 : (M1)
 Molecule 2 : (M2)
 Apply to mol 2 chain : (*)
 Applying operator to mol 2 ...
 Updating selected chain(s)/zone ...
 Nr of atoms moved : (       1900)
 Resetting ALL operators of mol 2 ...
 LSQMAN > wr m1 1cel.pdb a
 Command > (wr m1 1cel.pdb a)
 Write mol : (M1)
 Chain id  : (A)
 PDB file  : (1cel.pdb)
 Number of atoms written : (       3518)
 LSQMAN > wr m2 2ayh_rt.pdb a
 Command > (wr m2 2ayh_rt.pdb a)
 Write mol : (M2)
 Chain id  : (A)
 PDB file  : (2ayh_rt.pdb)
 Number of atoms written : (       1900)
 LSQMAN > vr ini
 Open VRML file : (lsqman.wrl)
 Opened VRML file
 LSQMAN > vr ad m1 a green
 VRML - Add mol M1                   chain A
 Nr of central atoms written : (        434)
 LSQMAN > vr ad m2 a red
 VRML - Add mol M2                   chain A
 Nr of central atoms written : (        214)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

7 WHICH ALIGNMENT IS BETTER ?

There are a million ways to superimpose two structures and to express the degree of their structural similarity. Usually, the number of aligned residues, and the RMSD of the CA atoms of these residues, are quoted, but small differences in parameters or programs can lead to different alignments and different statistics. Is an alignment of 100 residues with an RMSD of 0.5 Å better or worse than one of 200 residues with an RMSD of 1.0 Å ?

A number of statistics have been suggested in the past that try to calculate numbers that are normalised in some sense. LSQMAN calculates several of these. Commands that do the actual superpositioning of two structures calculate a number of useful statistics (that are also listed by the SHow command), including:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > sh m1 m2
 Operator bringing : (M2)
 on top of         : (M1)
 Last command was  : (FA M1 A M2 A 25 10 80)
 The     71 atoms have an RMS distance of    1.646 A
 SI = RMS * Nmin / Nmatch             =      3.10737
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.20153
 CR = Maiorov-Crippen RHO (0-2)       =      0.13964
 RR = relative RMSD                   =      0.14144
 NR = normalised RMSD (100)           =      1.987 A
 RMS delta B for matched atoms        =    12.116 A2
 Corr. coefficient matched atom Bs    =        0.215
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Of these statistics, CR, RR and NR are normalised, and they can be used to compare different alignments. Another way to get useful information about the quality of a structural alignment, is to use the GLobal command so as to get the Levitt-Gerstein (as well as many other) statistics:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > global m1 A m2 A 3.5 q.q
[...]
 Analysis of distance distribution:
 Number of distances                    :         79
 Average (A)                            :       1.46
 Standard deviation (A)                 :       0.82
 Variance (A**2)                        :       0.67
 Minimum (A)                            :       0.23
 Maximum (A)                            :       3.38
 Range (A)                              :       3.15
 Sum (A)                                :     115.70
 Root-mean-square (A)                   :       1.68
 Harmonic average (A)                   :       0.98
 Median (A)                             :       1.31
 25th Percentile (A)                    :       0.80
 75th Percentile (A)                    :       2.02
 Semi-interquartile range (A)           :       1.22
 Trimean (A)                            :       1.36
 50% Trimmed mean (A)                   :       1.32
 10th Percentile (A)                    :       0.50
 90th Percentile (A)                    :       2.70
 20% Trimmed mean (A)                   :       1.39
   
 Gap penalty            :        6.125
 Raw alignment score    :  -1.1409E+03
 L1 = Length sequence 1 :          134
 L2 = Length sequence 2 :          174
 Alignment length       :          229
 NI = Nr of identities  :            8
 L3 = Nr of matched res :           79
 RMSD for those (A)     :        1.668
 ID = NI/min(L1,L2) (%) :         5.97
 ID = NI/L3 (%)         :        10.13
   
 Levitt-Gerstein statistics:
 Nr of gaps       :           17
 Similarity score :   1.2638E+03
 Z-score          :   1.6596E+01
 P (z > Z)        :   6.1998E-08
 P (z > Z) is the probability of matching any two
 random structures and finding a Z-score z which
 is greater than the Z-score Z of the current pair.
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The above results were obtained after using the FAst_force command to align 1CRB and 1RBP. If we align the same two structures using the NWunsch command (using BLOSUM45 as matrix), the XAlign command (importing an alignment from Indonesia made with the Gonnet matrix), and using the OMAC macro "align.lsqmac", we get the following results:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
                              FAst_fo  NWunsch   XAlign   lsqmac
 Number of matched residues        71      129      126       79
 Their RMSD (A)                  1.65    15.12    14.78     1.66
 Maiorov-Crippen RHO             0.14     1.18     1.16     0.14
 Relative RMSD                   0.14     0.96     0.95     0.14
 Normalised RMSD (100) (A)       1.99    13.41    13.25     1.88
 Z-score                         16.6      2.4      2.5     16.3
 P (z > Z)                    6.2E-08  8.4E-02  7.5E-02  8.5E-08
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The two sequence-based alignments are clearly much worse than either structure-based one. Judging from the Levitt-Gerstein statistics, both structure-based alignments appear to be significant. In that case, I would prefer the one generated with "align.lsqmac", because it has a lower normalised RMSD (also reflected by the fact that it has 8 more residues aligned at the expense of a negligible increase of the overall RMSD).

8 WHAT'S THE DEAL WITH O'S "TRANSPOSE" MATRIX NOTATION ?

This is an issue that confuses many people to no end. Let's try and explain it once and for all (famous last words) ...

The problem stems from the fact that O stores its operators internally in a one-dimensional array, say RT(12), that contains both the 3-by-3 rotation matrix and the 3 components of the translation vector. O also sometimes uses these arrays in subroutines as if they were composed of a separate matrix (e.g., ROT(3,3)) and vector (e.g., TRANS(3)). However, this is all done consistently inside the program and is nothing the user ever has to worry about. (And the same is true for LSQMAN, by the way.)

Problems arise only when the operators are exposed to users or need to be exchanged with other programs (such as CNS or DM). This is because Fortran and humans store/write operators in a different fashion. The human way of writing an RT-transformation (RT = rotation + translation) is:

X(new) = ROT(1,1) * X(old) + ROT(1,2) * Y(old) + ROT(1,3) * Z(old) + TRANS(1)

Y(new) = ROT(2,1) * X(old) + ROT(2,2) * Y(old) + ROT(2,3) * Z(old) + TRANS(2)

Z(new) = ROT(3,1) * X(old) + ROT(3,2) * Y(old) + ROT(3,3) * Z(old) + TRANS(3)

However, Fortran stores multidimensional arrays always with the first index running quickest, followed by the second, etc. This means that the matrix elements of ROT are stored in the order ROT(1,1), ROT(2,1), ..., ROT(2,3), ROT(3,3). When the matrix is handled as a one-dimensional vector, writing it out with something like (RT(I),I=1,12) will therefore write the elements in the order: RT(1) (is really ROT(1,1)), RT(2) (=ROT(2,1)), RT(3) (=ROT(3,1)), RT(4) (=ROT(1,2)), ..., RT(9) (=ROT(3,3)), RT(10) (=TRANS(1)), RT(11) (=TRANS(2)), RT(12) (=TRANS(3)). So, referring to the X(new) etc. notation above, the rotation matrix (and translation vector) is written column-wise rather than row-wise. This does NOT mean that O and LSQMAN use transposed matrices - they just write them out using a different order of the matrix elements than certain other programs and people do or expect them to. And, again, both O and LSQMAN are internally entirely consistent, and the user rarely if ever has to be concerned with this issue (the only exception being the exchange of operators with certain other programs).

As far as LSQMAN is concerned, I think it always prints the operators in the same way as in the X(new) etc. equations, e.g.:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Rotation     :  -0.93637782 -0.23617344  0.25965112
                 -0.29769513  0.92627919 -0.23105074
                 -0.18594138 -0.29364768 -0.93765497
 Translation  :      71.0262    -16.4275     86.6216
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

and:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 RT-OP  1 =    -0.9363778   -0.2361734    0.2596511                 71.026
               -0.2976951    0.9262792   -0.2310507                -16.427
               -0.1859414   -0.2936477   -0.9376550                 86.622
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The only exception is when it prints an operator just prior to applying it to a molecule, e.g. compare the following set of 12 numbers with those in the above two examples:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Applying current operator to mol 2 : (  -0.936   -0.298   -0.186   -0.236
     0.926   -0.294    0.260   -0.231   -0.938   71.026  -16.427   86.622)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(Note: if you find other places in the program where I haven't yet changed the output to be in the equation-like format, let me know !)

Of course, when you read (OLd_o_operator command) or write (SAve_operator command) an operator in O-style format, LSQMAN adheres to the O-way of ordering the elements of the operator:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > save gmod1 gmod2
 Operator bringing : (GMOD2)
 on top of         : (GMOD1)
 File name ? (rt_gmod2_to_gmod1.odb)
 Save in file   : (rt_gmod2_to_gmod1.odb)
 Datablock name : (.lsq_rt_gmod2_to_gmod1)
 LSQMAN > $ cat rt_gmod2_to_gmod1.odb
 Spawn system command : (  cat rt_gmod2_to_gmod1.odb)
! Created by LSQMAN V. 031110/9.4.5 at Fri Dec 5 23:32:12 2003 for gerard
.lsq_rt_gmod2_to_gmod1 r 12 (3f15.7)
     -0.9363778     -0.2976951     -0.1859414
     -0.2361734      0.9262792     -0.2936477
      0.2596511     -0.2310507     -0.9376550
     71.0262375    -16.4274998     86.6216049
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

I'm not sure if this made matters any clearer, though :-)

9 FEATURES

Some of the features of LSQMAN:

9.1 chains

* when reading a PDB file, separate chains (XRAY) and separate models (NMR) are recognised and are automatically given chain identifiers A, B, ... Z (i.e., at most 26 chains or NMR models can be accomodated; this is the default behaviour)

9.2 zone definition

* definition of zones of a molecule is flexible:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   *      ... means all chains
   A*     ... means all residues in the first chain
   B3-36  ... means residues 3 through 36 in the second chain
   B3:36  ... means the same thing
   B3:B36 ... ditto
   A73    ... only residue A73 (same as A73-73 etc.)
   A1-999 ... means all residues in chain A with numbers
              between 1 and 999 that exist (use this if
              you're not sure how many residues a protein
              contains)
   A1-B36 ... is NOT a valid zone selection (use two zones,
              one for each chain)
   "A1-36 B3-59 C5 C12" ... defines multiple zones (note use
              of "double quotes")
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

9.3 atom types

* the atom types that are to be used for an explicit least- squares fit can be defined by the user; some types (for proteins) have been pre-defined, but if you want to fit 2 DNA molecules, or two ligands, this is just as easy

9.4 improving operators

* using the default settings, the improve option functions in a way similar to the LSQ_IMPROVE command in O (albeit considerably faster); however, there are lots of optional embellishments

9.5 O datablocks

* the program can read and write O-style datablocks containing (least-squares) rotation-translation operators

9.6 O macros

* the program can create macro files for O which will read the molecules that you are studying, apply the latest operator and display them

9.7 independence

* the operators from molecule A TO B and from B TO A are completely independent of one another

9.8 aligning multiple models

* from version 4.0 onwards, there are facilities for aligning multiple chains/models in a molecule. This can be used for analysis of NCS-related molecules or to create composite search models for Molecular Replacement.

9.9 analysing multiple, NCS and NMR models

* from version 4.0 onwards, there are several facilities for analysing and aligning multiple NCS chains and NMR models.

9.10 "ab initio" (brute force) alignment

* from version 5.0 onwards, there is a BRute_force command which will systematically try to align two molecules (chains), improve each alignment, and keep the one that gives the largest number of aligned residues

9.11 O compatibility

LSQMAN does also contain an equivalent of the LSQ_MOLECULE command in O, even though this may screw up your operators completely when you're analysing several molecules at the same time.

For consistency with O:

* RT-operators are used in Alwyn's "transpose-matrix" formalism

* when referring to an operator, the FIRST molecule is always the one that is FIXED and the SECOND is the one which will be brought on top of the first if the operator is applied

10 INTERFACE

LSQMAN uses the same simple and easy-to-use command interpreter that you know from MAMA, MAPMAN and other programs. The first two characters of (sub-)command names are unique; parameters may be supplied on the same line as the command, and if they are not, LSQMAN will prompt you for them (using fairly reasonable default values; to use a default, just hit RETURN at such a prompt).

NOTE: parameter values with SPACES in them MUST be delimited by "DOUBLE QUOTES" !

The program runs in interactive mode by default; it can be run in batch mode by supplying the -b flag when you start the program.

All new files are opened as UNKNOWN, so any existing files will be overwritten !

11 STARTUP

When you start the program, you see something like this:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***
   
 Version  - 060801/9.7.2
 (c) 1992-2005 Gerard J. Kleywegt, Dept. Cell Mol. Biol., Uppsala (SE)
 User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL)
 Others   - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson
 Others   - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.
   
 Started  - Wed Aug 2 12:37:39 2006
 User     - gerard
 Mode     - interactive
 Host     - localhost (Linux/i386)
 ProcID   - 6243
 Tty      - /dev/pts/1
   
 *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***
   
 Reference(s) for this program:
   
 *  1 * G.J. Kleywegt & T.A. Jones (1994).  Halloween ... Masks and
        Bones. In "From First Map to Final Model", edited by
        S. Bailey, R. Hubbard and D. Waller.  SERC Daresbury
        Laboratory, Warrington, pp. 59-66.
        [http://xray.bmc.uu.se/gerard/papers/halloween.html]
   
 *  2 * G.J. Kleywegt & T.A. Jones (1994). A super position.
        CCP4/ESF-EACBM Newsletter on Protein Crystallography 31,
        November 1994, pp. 9-14.
        [http://xray.bmc.uu.se/usf/factory_4.html]
   
 *  3 * G.J. Kleywegt & T.A. Jones (1995). Where freedom is given,
        liberties are taken. Structure 3, 535-540.
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8590014&dopt=Citation]
   
 *  4 * G.J. Kleywegt (1996). Use of non-crystallographic symmetry
        in protein structure refinement. Acta Cryst D52, 842-857.
        [http://scripts.iucr.org/cgi-bin/paper?gr0471]
   
 *  5 * G.J. Kleywegt (1996). Making the most of your search model.
        CCP4/ESF-EACBM Newsletter on Protein Crystallography 32,
        June 1996, pp. 32-36.
        [http://xray.bmc.uu.se/usf/factory_6.html]
   
 *  6 * G.J. Kleywegt & T.A. Jones (1996). Phi/Psi-chology:
        Ramachandran revisited. Structure 4, 1395-1400.
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=8994966&dopt=Citation]
   
 *  7 * G.J. Kleywegt & T.A. Jones (1997). Detecting folding motifs
        and similarities in protein structures. Methods in
        Enzymology 277, 525-545.
   
 *  8 * T.A. Jones & G.J. Kleywegt (1999). CASP3 comparative
        modelling evaluation.
        Proteins: Struct. Funct. Genet. Suppl. 3, 30-46.
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10526350&dopt=Citation]
        [http://xray.bmc.uu.se/casp3]
   
 *  9 * G.J. Kleywegt (1999). Experimental assessment of
        differences between related protein crystal structures.
        Acta Cryst. D55, 1878-1857.
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10531486&dopt=Citation]
        [http://scripts.iucr.org/cgi-bin/paper?se0283]
   
 * 10 * Y.W. Chen, E.J. Dodson & G.J. Kleywegt (2000). Does NMR
        mean "Not for Molecular Replacement" ? Using NMR-based
        search models to solve protein crystal structures.
        Structure 8, R213-R220.
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=11080645&dopt=Citation]
   
 * 11 * D. Madsen & G.J. Kleywegt (2002). Interactive motif and
        fold recognition in protein structures. J. Appl. Cryst.
        35, 137-139.
        [http://scripts.iucr.org/cgi-bin/paper?wt0007]
   
 * 12 * M. Novotny, D. Madsen & G.J. Kleywegt (2004). An evaluation
        of protein-fold-comparison servers. Proteins, 54, 260-270
        (2004).
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=14696188&dopt=Citation]
   
 * 13 * M.L. Sierk & G.J. Kleywegt (2004). Deja vu all over again:
        finding and analyzing protein structure similarities.
        Structure 12, 2103-2111.
        [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=15576025&dopt=Citation]
   
 * 14 * Kleywegt, G.J., Zou, J.Y., Kjeldgaard, M. & Jones, T.A. (2001).
        Around O. In: "International Tables for Crystallography, Vol. F.
        Crystallography of Biological Macromolecules" (Rossmann, M.G.
        & Arnold, E., Editors). Chapter 17.1, pp. 353-356, 366-367.
        Dordrecht: Kluwer Academic Publishers, The Netherlands.
   
 ==> For manuals and up-to-date references, visit:
 ==>     http://xray.bmc.uu.se/usf
 ==> For reprints, visit:
 ==>     http://xray.bmc.uu.se/gerard
 ==> For downloading up-to-date versions, visit:
 ==>     ftp://xray.bmc.uu.se/pub/gerard
   
 *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***
   
 Allocate buffer arrays of size  : (    5000000)
   
 Max nr of molecules             : (          8)
 Max nr of residues per molecule : (      30000)
 Max nr of atoms per molecule    : (     200000)
 Max nr of atom types            : (         15)
 Max nr of chains/models per mol : (         26)
   
 *** BLOSUM-45 substitution matrix loaded ***
   
 Symbol START_TIME : (Wed Aug  2 12:37:39 2006)
 Symbol USERNAME : (gerard)
   
 Initialising : (XVRML - 20051205/0.8)
 Nr of predefined colours : (        411)
   
 LSQMAN options :
   
 ? (list options)                     ! (comment)
 QUit                                 $ shell_command
 & symbol value                       & ? (list symbols)
 @ macro_file                         ECho on_off
 # parameter(s) (command history)
   
 REad mol pdb_file [chain] [atom]     WRite mol pdb_file [chain] [first] [last]
 DElete mol                           ANnotate mol comment_string
 LIst [mol]                           CHain_mode mode
 TYpe_residues mol                    BFactor_range b_lo b_hi
 FRactionalise mol                    ORthogonalise mol
 CEll mol a b c al be ga              SUbtract_ave_b mol
 HYdrogens keep_or_strip              HEtatm keep_or_strip
 NMr_model_mode all_or_first          AA_substitution_matrix filename
   
 ALter CHain_id mol chain new_chain   ALter SEgid mol segid new_segid
 ALter FOrce mol chain new_segid      ALter SAme mol chain
 ALter REnumber mol chain [first]
   
 EXplicit mol1 range1 mol2 range2     NWunsch mol1 chain1 mol2 chain2 gap
 BRute_force mol1 chain1 mol2 chain2 frag_length frag_step min_match [S|D] [slide_step]
 FAst_force  mol1 chain1 mol2 chain2 frag_length frag_step min_match [S|D] [slide_step]
 XAlignment mol1 chain1 mol2 chain2 pir_alignment_file
   
 IMprove mol1 range1 mol2 range2
 DP_improve mol1 chain1 mol2 chain2 mode cut_off max_cycles [verbose]
   
 GLobal_nw mol1 chain1 mol2 chain2 cut_off [log_file]
   
 EDit_operator mol1 mol2 val1 ...     SHow_operator mol1 mol2
 SAve_operator mol1 mol2 file [name]  PErturb_operator mol1 mol2 [amplitude]
 APply_operator mol1 mol2_to_move [chain] [first] [last]
 OLd_o_operator mol1 mol2 file        RMsd_calc mol1 range1 mol2 range2 [Ltarget]
 WAters mol1 mol2 cut_off plot_file   HIsto_dist mol1 mol2 cut_off bin
   
 MOrph mol1 range1 mol2 range2 nsteps basename type oid range3 cutoff
 SImilarity_plot mol1 chain1 mol2 chain2 plot_file [start] [end] [step]
 LEsk_plot mol1 chain1 mol2 chain2 plot_file
 PHipsi_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]
 ETa_theta_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]
 DIstance_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]
 DDihe_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]
 D1_D2_plot mol1 range1 mol2 range2 plot_file [cut-off] [hist_bin] [hist_max]
 QDiff_dist_plot mol1 range1 mol2 range2 2d_plot_file
 DChi mol1 range1 mol2 range2 [cut-off]
 SOap_film mol1 chain1 mol2 chain2 odl_file [verbose]
   
 MCentral mol residue_range exp_imp   MAlign mol residue_range exp_imp chain
 MDihedral mol chain plot_file [cut]  MRamachandran mol chain ps_file [cut] [how]
 MSide_ch mol chain plot_file [cut]   MTorsion mol chain ps_file [cut] [how]
 VMain_ch mol chain plot_file [cut]   VSide_ch mol chain plot_file [cut]
 MPlot mol chain plot_file ps_file [dmax_black] [cut_dist_print]
 MBfactors mol chain plot_file [cut]
   
 JUdge target tchn parent pchn model mchn dist phi chi
 CAsp target tchn model mchn [start] [end] [step]
   
 GEt XYz mol chain x y z radius symbol_name [O_macro]
   
 FIx_atom_names mol1 range1 mol2 range2 mode how what [min_gain] [cut_off]
 NOmenclature mol                     INvert_ncs infile outfile
 NUcleic_acid_pdb_nomenclature mol
   
 ATom_types ?                         ATom_types CA
 ATom_types MAin_chain                ATom_types SIde_chain
 ATom_types EXtended_main_chain       ATom_types ALl
 ATom_types NOn_hydrogen              ATom_types DEfine type1 [type2 ...]
 ATom_types PHosphorous               ATom_types TRace_and_side_chain
 ATom_types C4*                       ATom_types NUcleic_acid_backbone
   
 SEt ?                                SEt REset_defaults
 SEt COarse_6A_fit_defaults           SEt INtermediate_4A_fit_defaults
 SEt FIne_tune_3A_fit_defaults        SEt SImilar_mols_2A_fit_defaults
   
 SEt MAx_nr_improve_cycles value      SEt DIst_max value
 SEt MIn_fragment_length value        SEt DEcay value
 SEt OPtimisation_criterion value     SEt SEquential_hits on_off
 SEt RMs_weight value                 SEt FRagment_length_decay value
 SEt SHift_correction on_off          SEt NUcleic_acid_defaults
   
 OMacro INit mol1 file                OMacro APpend mol2
 OMacro WRite o_command_string        OMacro CLose_file
 OMacro DEfine central_atom max_dist connect_file
   
 VRml SEtup central_atom max_dist backgr_col default_col
 VRml INit [vrml_file]                VRml COlour_list
 VRml ADd mol [chain] [colour]        VRml ALl_chains mol
   
 Max nr of molecules             : (          8)
 Max nr of residues per molecule : (      30000)
 Max nr of atoms per molecule    : (     200000)
 Max nr of atom types            : (         15)
   
 Execute initialisation macro : (/home/gerard/lsqman.init)
 ... Opened macro file : (/home/gerard/lsqman.init)
 ... On unit : (      61)
 Command > (! LSQMAN initialisation macro)
 Command > (echo on)
        1  @ /home/gerard/lsqman.init
        2  ! LSQMAN initialisation macro
        3  echo on
 Command > (!)
 ... End of macro file
 ... Control returned to terminal
 LSQMAN >
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

12 GENERAL COMMANDS

12.1 ? (list commands)

Print a list of all available commands and a summary of the dimensioning of the program (maximum number of molecules, etc.).

12.2 ! (ignored comment)

If the first character of a line is '!', then this line is treated as a comment line (for use in input files).

12.3 QUit (stop working with the program)

Stop working with the program.

12.4 ECho (toggle command-line echo on/off)

If you run the program with scripts, it is sometimes useful to see input commands echoed. The parameter to the ECho command may be ON, OFf, or ? (to list the echo status).

12.5 #

Command history. Possible uses (blank spaces are optional):
- # ? => list history of commands
- # ! => ditto, but without numbers (handy for copying into macros)
- # ON => switch command history on
- # OFf => switch command history off
- # # => repeat previous command
- # 14 => repeat command number 14 from the list
- # 0 => repeat previous command
- # -1 => repeat penultimate command, etc.
- # 7 more => repeat command number 7, but add "more" to it (e.g., if command 7 was "$ ls" you could type "#7 -FartCos" to get "$ ls -FartCos")

12.6 $ (issue shell command)

Issue a shell command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > $ ls -FartCos *.odb
   1 -rw-r--r--   1 gerard       297 Oct 22 20:38 rt_1ace_to_lipa.odb
   1 -rw-r--r--   1 gerard       297 Oct 22  1993 rt_1etu_to_eftu.odb
   1 -rw-r--r--   1 gerard       297 Oct 22  1993 rt_1lap_to_eftu.odb
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

12.7 @ (execute LSQMAN macro)

Execute a macro

Example of an LSQMAN macro:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 ! ana_ncs.lsqmac
 !
 ! do some basic NCS analyses
 !
 ! Enter LO for normal PDB files, or XP for X-PLOR PDB files:
 chain_mode
 !
 ! Enter PDB file name:
 read mymol
 !
 ! Enter PostScript file for sigma(phi),sigma(psi) plot:
 mdihedral mymol a
 !
 ! Enter PostScript file for multiple Ramachandran plot:
 mramachandran mymol a
 !
 ! Enter PostScript file for sigma(chi1),sigma(chi2) plot:
 mside_chains mymol a
 !
 ! Enter PostScript file for multiple chi1,chi2 plot:
 mtorsion mymol a
 !
 ! Enter PostScript file for sigma(B),range(B) plot:
 mbfactors mymol a
 !
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

When executed this gives:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > @ana_ncs.lsqmac
 ... Opened macro file : (ana_ncs.lsqmac)
 ... On unit : (      61)
 > (! ana_ncs.lsqmac)
 > (!)
 > (! do some basic NCS analyses)
 > (!)
 > (! Enter LO for normal PDB files, or XP for X-PLOR PDB files:)
 > (chain_mode)
 Select one of the following modes:
 REname   = chains are renamed A, B, .. Z
 ORiginal = chain names are not altered
 XPlor    = rename; SEGIds delineate chains
 BReak    = rename; use breaks in residue numbers
 LOwer    = rename; use drop in residue numbers
 Chain mode ? (LO)
 Chain-mode LOwer
 > (!)
 > (! Enter PDB file name:)
 > (read mymol)
 File name ? ( ) /nfs/pdb/full/1cbr.pdb
 Cell : (  41.440   41.440  202.800   90.000   90.000   90.000)
 New chain name |A| at residue PRO     1
 New chain name |B| at residue PRO     1
 Nr of lines read from file : (       2596)
 Nr of atoms in molecule    : (       2246)
 Nr of chains or models     : (          2)
 Stripped hydrogen atoms    : (          0)
 > (!)
 > (! Enter PostScript file for sigma(phi),sigma(psi) plot:)
 > (mdihedral mymol a)
 Multiple chain/model dihedral analysis
 Plot file ? (mymol_phi_psi_sigma.plt)
 Reference chain : (A)
 Residue range :     1 -   313
 PRO     1 |      0.0     0.0     0.0     0.0 |   -163.0     0.0  -163.0  -163.0 |   0  2
 ASN     2 |   -120.0     0.0  -120.0  -120.0 |     84.6     0.0    84.6    84.6 |   2  2
 ...
 Plot file written
 > (!)
 > (! Enter PostScript file for multiple Ramachandran plot:)
 > (mramachandran mymol a)
 Multiple Ramachandran plot
 PostScript file ? (mymol_multi_rama.ps)
 Reference chain : (A)
 Residue range :     1 -   313
 ...
 PostScript file written
 > (!)
 > (! Enter PostScript file for sigma(chi1),sigma(chi2) plot:)
 > (mside_chains mymol a)
 Multiple side-chain torsion analysis
 Plot file ? (mymol_chi12_sigma.plt)
 Reference chain : (A)
 Residue range :     1 -   313
 ...
 Plot file written
 > (!)
 > (! Enter PostScript file for multiple chi1,chi2 plot:)
 > (mtorsion mymol a)
 Multiple torsion plot
 PostScript file ? (mymol_chi12_dist.ps)
 Reference chain : (A)
 Residue range :     1 -   313
 ...
 PostScript file written
 CPU total/user/sys :       1.2       1.1       0.1
 > (!)
 > (! Enter PostScript file for sigma(B),range(B) plot:)
 > (mbfactors mymol a)
 Multiple chain/model B-factor analysis
 Plot file ? (mymol_bfac_multi.plt)
 Reference chain : (A)
 Residue range :     1 -   313
 Central atom type : ( CA)
 ...
 Nr of residues found : (        136)
 SIGMA(B) Ave, Sdv, Min, Max :      0.0     0.0     0.0     0.0
 RANGE(B) Ave, Sdv, Min, Max :      0.0     0.0     0.0     0.0
 Plot file written
 > (!)
 ... End of macro file
 ... Control returned to terminal
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

12.8 & (manipulate symbols)

This command can be used to manipulate symbols. These are probably only useful for advanced users who want to write fancier macros. The command can be used in three ways:
(1) & ? -> lists currently defined symbols
(2) & symbol value -> sets "SYMBOL" to "value"
(3) & symbol -> prompts the user to supply a value for "SYMBOL" (even if the program is executing a macro)

A few symbols are predefined:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > & ?
 Nr of defined symbols : (       4)
 Symbol PROGRAM : (LSQMAN)
 Symbol VERSION : (960517/4.6)
 Symbol START_TIME : (Fri May 17 20:34:27 1996)
 Symbol USERNAME : (gerard)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The symbol mechanism is fairly simplistic and has some limitations:
- max length of a symbol name is 20 characters
- max length of a symbol value is 256 characters
- max number of symbols is 100
- symbols can not be deleted, but they can be redefined
- symbol values are accessed by supplying $SYMBOL_NAME as an argument on the command line; the line that you type on the terminal (or in a macro) is parsed once; if there are additional parameters which the program prompts you for, you cannot use symbols for those
- only one substitution per argument (e.g., "$file1 $file2" will lead to a substituion of the entire argument by the value of symbol FILE1 only !)
- command names (first argument on any command line) cannot be replaced by a symbol (e.g.: "$command $arg1 $arg2" is not valid)
- symbols may be equated to each other, e.g. "& file2 $file1" will give FILE2 the same value as FILE1
- symbol substitution is not recursive (e.g., if you set the value of FILE2 to be "$file1", any reference to $FILE2 will be replaced by "$file1", not by the value of FILE1
- symbols on comment lines (starting with "!") are not expanded
- symbols on system command lines (starting with "$") are not expanded

13 I/O AND BOOK-KEEPING COMMANDS

13.1 REad (read molecule into memory)

Read a molecule into memory. You must provide a NAME for the molecule (by which you will refer to it later) and the name of a PDB file.

Only ATOM/HETATM and MODEL (for multiple NMR structures) cards are handled. Every chain or NMR model gets a chain identifier, starting at A, B, ... Z. Therefore, no more than 26 chains or NMR models can be read into memory (unless you set CHain_mode to ORiginal).

Note: some older PDB files contain residues with insert codes (e.g., PDB entry 1HBT). These are not handled by LSQMAN and would lead effectively to the removal of the residues with duplicate residue numbers (but different insert codes). To prevent this from happening, you can renumber the residues, for example in MOLEMAN2 (PDb REnum command).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch xp
 Chain-mode XPlor
 LSQMAN > re m3 m12abcd.pdb
 XPLOR SEGId |GTAA| becomes chain A
 XPLOR SEGId |GTAB| becomes chain B
 XPLOR SEGId |GTAC| becomes chain C
 XPLOR SEGId |GTAD| becomes chain D
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.4       5.7       0.6
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.2 WRite (write molecule to PDB file)

Write a molecule to a PDB file. The chain identifiers are those that were assigned during the READ step (see 2.5). All atoms are written as ATOM cards (i.e., HETATM information is lost). NMR models will be (re-)numbered 1, 2, ... 26.

Optional parameters:
- chain id (e.g., A, B, ..., Z, or * to denote all chains)
- first residue (e.g., 1, 163, ...)
- last residue (e.g., 99, 1000, ...)
By default, all residues of all chains are written.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > wr m1 q.pdb
 Number of atoms written : (        987)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > wr m2 q.pdb a 15 25
 Command > (wr m2 q.pdb a 15 25)
 Write mol : (M2)
 Chain id  : (A)
 PDB file  : (q.pdb)
 First res : (      15)
 Last  res : (      25)
 Number of atoms written : (         69)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.3 DElete (erase molecule from memory)

Delete a molecule from memory.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > del m2
 Deleted : (M2)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.4 ANnotate (comment string for molecule)

Edit the comment string for a molecule. If you supply the comment string on the command line, be sure to use "DOUBLE QUOTES" if your comment contains one or more spaces !

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > an m1
 Label ? (Read from 2aza.pdb) azurin 2aza
 LSQMAN > an m2 "azurin 1azu"
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.5 LIst (information about molecule)

List some information about any or all molecules currently in memory. If you don't supply a molecule name, the program will do this for all molecules (also if you enter an *asterisk*).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > li *
   
 List    : (M1)
 File    : (q.pdb)
 Comment : (azurin 2aza)
 Nr of atoms in mol  : (        987)
 Multiple NMR models ? (F)
 Nr of chains/models : (          1)
   
 List    : (M2)
 File    : (1azu.pdb)
 Comment : (azurin 1azu)
 Nr of atoms in mol  : (        930)
 Multiple NMR models ? (F)
 Nr of chains/models : (          1)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.6 CHain_mode (naming of chains/models when read from PDB file)

Determine what LSQMAN should do with the various chains or NMR models in a given PDB file. You have the following choices:

REname = chains and NMR models are renamed A, B, .. Z

ORiginal = chain names are not altered

NOn-blank = chain names are not altered, except that blank chain IDs are replaced by _underscores_

XPlor = chains are renamed A, B, ... Z; X-PLOR SEGIds delineate the chains

BReak = chains are renamed A, B, ... Z; breaks in the "i,i+1,i+2" numbering of residues are used to delineate chains (e.g., residue numbers 354, 355, 501, would introduce a new chain at residue 501)

LOwer = chains are renamed A, B, ... Z; breaks in the numbering where a residue has a residue number lower than that of the previous residue are used to delineate chains (e.g., if your protein is numbered 5-193 and your ligand 200 and waters 300-389, then all of these will be considered to be part of one single chain)

When you read one of your own PDB files in which the chains have names that you are familiar with, use ORiginal or NOn-blank mode.
When you read PDB files that you're not familiar with, or PDB files containing multiple NMR models, REname is probably the best option.
When you read a PDB file created for or by X-PLOR, use the XPlor chain mode (in which X-PLOR SEGIds are used to recognise where one chain ends and the next begins).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch
 Select one of the following modes:
 REname   = chains are renamed A, B, .. Z
 ORiginal = chain names are not altered
 XPlor    = rename; SEGIds delineate chains
 BReak    = rename; use breaks in residue numbers
 LOwer    = rename; use drop in residue numbers
 Chain mode ? (XP) re
 Chain-mode REname
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch or
 Chain-mode ORiginal
 LSQMAN > re m1 m12abcd.pdb
 Old chain name |A| kept
 Old chain name |B| kept
 Old chain name |C| kept
 Old chain name |D| kept
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.8       5.9       0.9
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch re
 Chain-mode REname
 LSQMAN > re m2 m12abcd.pdb
 Old chain |A| becomes chain A
 Old chain |B| becomes chain B
 Old chain |C| becomes chain C
 Old chain |D| becomes chain D
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.5       5.8       0.7
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch xp
 Chain-mode XPlor
 LSQMAN > re m3 m12abcd.pdb
 XPLOR SEGId |GTAA| becomes chain A
 XPLOR SEGId |GTAB| becomes chain B
 XPLOR SEGId |GTAC| becomes chain C
 XPLOR SEGId |GTAD| becomes chain D
 Nr of lines read from file : (       7222)
 Nr of atoms in molecule    : (       7184)
 Nr of chains or models     : (          4)
 CPU total/user/sys :       6.4       5.7       0.6
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.7 TYpe_residues (list residues of molecule)

This will simply list the first atom of every residue in the selected molecule.

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > ty m3 List of residues in : (M3) 1 1 CB ALA A 2 82.844 39.989 -7.039 1.00 68.35 2 6 N GLU A 3 83.214 36.859 -8.993 1.00102.90 3 15 N LYS A 4 84.386 34.656 -8.538 1.00 47.66 ... 883 7162 N ARG D 221 84.092 37.862 71.505 1.00 60.42 884 7173 N PHE D 222 84.004 35.762 73.551 1.00 92.91

CPU total/user/sys : 1.5 0.9 0.6 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.8 BFactor_range (exclude atoms with undesired temperature factors)

Define a range of temperature factors for atoms to be used in the EXplicit and RMsd commands. All atoms with a B outside this range will be skipped (but not in the IMprove command).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 B-factor range used:    -1.00 - 10000.00 A2
 Nr of atoms to match  : (        370)
 Nr skipped (B limits) : (          0)
 The    370 atoms have an RMS distance of    0.759 A
 RMS delta B  =    4.459 A2
 Corr. coeff. =      0.8359
 ...
 LSQMAN > bf
 Lower B-factor cut-off ? (  -1.000000)
 Upper B-factor cut-off ? (   9999.999) 30
 Lower B cut-off : (  -1.000)
 Upper B cut-off : (  30.000)
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 B-factor range used:    -1.00 -    30.00 A2
 Nr of atoms to match  : (        259)
 Nr skipped (B limits) : (        111)
 The    259 atoms have an RMS distance of    0.491 A
 RMS delta B  =    3.584 A2
 Corr. coeff. =      0.7609
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.9 CEll (edit cell constants of molecule)

Set or change the cell constants of a molecule.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > cel m1
 A axis (A) ? (   111.8900)
 B axis (A) ? (   111.8900)
 C axis (A) ? (   148.4900)
 Alpha (deg) ? (   90.00000)
 Beta (deg) ? (   90.00000)
 Gamma (deg) ? (   90.00000)
 Molecule : (M1)
 Cell axes (A) : ( 111.890  111.890  148.490)
 Angles (deg)  : (  90.000   90.000   90.000)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.10 FRactionalise (Cartesian to fractional)

Fractionalise the coordinates of a molecule. This may help you detect spacegroup errors or special translations. For example, for PDB entry 1CHR the spacegroup error (I4 with two-fold NCS is really I422 without NCS) is especially clear from the fractional operator (see below): the "NCS" operator relating the two molecules in fractional space is (X, -Y+1, -Z+1). Note that the program doesn't know if your coordinates are in fractional or orthogonal A coordinates (in principle you could read them in in fractional space) !!! The RMSD is therefore not very useful as a number !!!

NOTE: it may (or may not) also help in detecting origin differences and relations between molecules solved in the same spacegroup but within different asymmetric units of the cell.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1chr.pdb
 Cell : ( 111.890  111.890  148.490   90.000   90.000   90.000)
 ...
 LSQMAN > fr m1
 Operator : (   0.009    0.000    0.000    0.000    0.009    0.000
  0.000    0.000    0.007    0.483    0.643    0.751)
 Atom #1 before : (  11.840   26.637   68.001)
 Atom #1 after  : (   0.106    0.238    0.458)
 Fractionalised : (M1)
 LSQMAN > ex m1 a1-999 m1 b1
 ...
 The    370 atoms have an RMS distance of    0.006 A
 RMS delta B  =    4.459 A2
 Corr. coeff. =      0.8359
 Rotation    :   0.999999 -0.001595  0.000409
                -0.001594 -0.999992 -0.003703
                 0.000415  0.003702 -0.999993
 Translation :      0.001     0.999     1.001
 CPU total/user/sys :       2.0       2.0       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The two molecules in 1CEL are related by an almost perfect translation of (0.46,1/2,1/2):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1cel.pdb
 Cell : (  84.000   86.200  111.800   90.000   90.000   90.000)
 ...
 LSQMAN > fr m1
 Operator : (   0.012    0.000    0.000    0.000    0.012    0.000
  0.000    0.000    0.009    0.000    0.000    0.000)
 Atom #1 before : (  37.768   59.322   40.174)
 Atom #1 after  : (   0.450    0.688    0.359)
 Fractionalised : (M1)
 LSQMAN > ex m1 a1-999 m1 b1
 WARNING - mol1 == mol2 !
 ...
 The    434 atoms have an RMS distance of    0.001 A
 RMS delta B  =    2.201 A2
 Corr. coeff. =      0.9738
 Rotation    :   0.999859 -0.016626  0.002191
                 0.016642  0.999833 -0.007543
                -0.002066  0.007578  0.999969
 Translation :      0.461     0.497     0.503
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.11 ORthogonalise (fractional to Cartesian)

Orthogonalise the coordinates of a molecule.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > or m1
 Operator : ( 111.890    0.000    0.000    0.000  111.890    0.000
  0.000    0.000  148.490    0.009    0.000    0.000)
 Atom #1 before : (   0.106    0.238    0.458)
 Atom #1 after  : (  11.840   26.637   68.001)
 Orthogonalised : (M1)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.12 NUcleic_acid_pdb_nomenclature (use PDB nucleic acid atom and residue names)

Enforce PDB nomenclature for nucleotide names (" A", etc.) and atom names (i.e., " C4*" rather than " C4'").

IMPORTANT NOTICE - as of version 9.7.7, this option has been changed so it recognises DA, DC, DG and DT. However, LSQMAN continueS to convert quotes to asterisks as the fourth character in atom names!

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > nucl m1
 PDB NA nomenclature for : (M1)
 Atoms with changed residue type : (        309)
 Atoms with changed atom type    : (        126)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.13 NOmenclature (check side-chain atom names)

Enforce proper nomenclature for the equivalent side-chain atoms of Asp, Glu, Phe, Tyr and Arg residues. This is important if these atoms are going to be used in a comparison (e.g., all-atom RMSD or side-chain torsion analyses).
Normally, this command would be used in conunction with the FIx_atom_names command.

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > re m2 /nfs/alien/gerard/acbp/o/probe14.pdb ... LSQMAN > nomenclature m2 Enforce proper nomenclature for : (M2) Nr of atoms : ( 9772) Nr of residues : ( 1204) Error in GLU A 10 ... Error in GLU A 11 ... Error in ASP A 21 ... ... Error in GLU N 79 ... Error in TYR N 84 ... # of PHE checked : 42 # errors : 18 # of TYR checked : 56 # errors : 35 # of ASP checked : 98 # errors : 35 # of GLU checked : 140 # errors : 46 # of ARG checked : 14 # errors : 5 WARNING - any attached hydrogens NOT renamed LSQMAN > nomenclature m2 Enforce proper nomenclature for : (M2) Nr of atoms : ( 9772) Nr of residues : ( 1204)

# of PHE checked : 42 # errors : 0 # of TYR checked : 56 # errors : 0 # of ASP checked : 98 # errors : 0 # of GLU checked : 140 # errors : 0 # of ARG checked : 14 # errors : 0 No problem, mon ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.14 FIx_atom_names (correct side-chain atom names)

When comparing two *PROTEINS* using all (non-hydrogen) atoms or side-chain torsion angles, a few residue types may have artificially large differences: Asp, Glu, Arg, Phe and Tyr. For example, if you have two-fold NCS and one Phe-ring is "flipped" in molecule 2 compared to molecule 1, the CD1,2/CE1,2 atoms may have an RMSD of ~1.5 A even though the superimpose perfectly. In this situation, the CHI2 torsions may also differ by ~180 degrees giving large spikes in chi1,2 plots !
The solution is to rename such atoms in molecule 2, i.e. to swap the labels CD1<->CD2 and CE1<->CE2.
A similar situation may arise for Asn, Gln and His, assuming that the crystallographer was unable to distinguish the N/O or C/N atoms unambiguously.
Note that this command is *hard-wired* for proteins !!!

The FIx_atom_names command takes the following arguments:
- mol1 range1 = molecule 1 and zone(s)
- mol2 range2 = molecule 2 and begin point of zone(s)
- mode = Strict (only check Asp, Glu, Arg, Phe, Tyr) or All (also check Asn, Gln and His)
- how = Sequential (assumes a 1:1 correspondence in the sequences of the zones) or Nearest (finds the residue in molecule 2 whose CA atoms after LSQ is nearest that of the residue in molecule 1; this is an unreliable method and should only be used if you are interested in all-atom comparisons of molecules with different sequences, which is not a good idea in the first place) if you use Nearest, you *MUST* have superimposed molecule 2 onto molecule 1 previously, since this operator is needed to find the nearest residue
- what = Rmsd (minimises the RMSD of the ambiguous atoms) or Torsion (minimises the absolute difference between the affected side-chain torsions, e.g. CHI2 for Asp, Phe, Tyr and CHI3 for Glu); if you use Rmsd, you *MUST* have superimposed molecule 2 onto molecule 1 previously, since this operator is needed to calculate the rmsd-values
- min_gain = optional parameter which defines how much must be gained (in terms of rmsd or torsion-angle differences) before the atoms are renamed (if you would gain 0.000001 A by renaming the atoms, it's not worth the trouble)
- cut_off = optional parameter, only used when "how" is set to "Nearest"; it defines the maximum allowable CA-CA distance before a residue in molecule 2 is matched to one in molecule 1

So, how would you go about in practice ?
- identical sequences, calculating all-atom RMSDs: NOmen m1; NOmen m2; ATom CA; EXplicit m1 a1-999 m2 a1; FIx m1 a1-999 m2 a1 str seq rmsd 0.01; ATom NOnh; EXplicit m1 a1-999 m2 a1
- NCS (identical sequences), comparing torsion angles: NOmen m1; FIx m1 a1-999 m1 b1 str seq tors 0.1; then repeat for each of the other NCS units (e.g., chains C, D, etc.); then use MSide or MTors to generate the plots
- different sequences: don't do all-atom comparisons

For example, if you look at PDB entry 1CEL, there are a few instances of swapped sidechains. Correcting for this reduces the all-atom RMSD by about a third !!!

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > re m1 1cel.pdb LSQMAN > at no LSQMAN > ex m1 a1-999 m1 b1 ... The 3518 atoms have an RMS distance of 0.255 A ... LSQMAN > fix m1 a1-999 m1 b1 strict seq rmsd 0.01 WARNING - mol1 == mol2 ! Reference atoms M1 A1-999 Fix atoms for M1 B1 Only fix Asp/Glu/Arg/Phe/Tyr Use sequential residues (1:1 correspondence) Minimise RMSD Minimum improvement : ( 0.010) Applying current operator to Mol 2 ... Nr of RT operators : 1 RT-OP 1 = 0.9998649 0.0163407 -0.0018067 38.703 -0.0163272 0.9998405 0.0072388 42.876 0.0019247 -0.0072083 0.9999722 56.158 Determinant of rotation matrix 1.000000 Column-vector products (12,13,23) 0.000000 0.000000 0.000000 Crowther Alpha Beta Gamma 104.01381 0.42750 -104.94973 Spherical polars Omega Phi Chi 155.42732 -165.51825 1.02912 Direction cosines of rotation axis -0.40219 -0.10388 -0.90943 Dave Smith -0.41475 89.88973 -0.93629 Rotation angle 1.029076 Zone : ( 1) Fix sidechain of ASP-B- 63 ( 1.48 versus 0.30) Fix sidechain of PHE-B- 146 ( 1.54 versus 0.24) Fix sidechain of TYR-B- 167 ( 1.54 versus 0.29) Fix sidechain of GLU-B- 217 ( 1.48 versus 0.18) Fix sidechain of TYR-B- 274 ( 1.55 versus 0.20) Fix sidechain of PHE-B- 280 ( 1.54 versus 0.30) Fix sidechain of TYR-B- 321 ( 1.54 versus 0.23)

Residues checked : ( 85) Residues fixed : ( 7) CPU total/user/sys : 2.6 2.6 0.0 LSQMAN > ex m1 a1-999 m1 b1 ... The 3518 atoms have an RMS distance of 0.164 A ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The same effect can be observed if the torsion angles are used:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1cel.pdb
 LSQMAN > ms m1 a m1_chi12_sigma.plt
 ...
 Nr of residues found : (        434)
 SIGMA(chi1) Ave, Sdv, Min, Max :      1.1     1.6     0.0    19.5
 RANGE(chi1) Ave, Sdv, Min, Max :      2.2     3.2     0.0    39.0
 SIGMA(chi2) Ave, Sdv, Min, Max :      2.3    10.5     0.0    89.9
 RANGE(chi2) Ave, Sdv, Min, Max :      4.6    21.0     0.0   179.8
 ...
 LSQMAN > fix m1 a1-999 m1 b1
 WARNING - mol1 == mol2 !
 Mode (Strict/All) ? (S)
 How (Sequential/Nearest) ? (S)
 Minimise what (Rmsd/Torsions) ? (R) t
 Reference atoms M1 A1-999
 Fix atoms for   M1 B1
 Only fix Asp/Glu/Arg/Phe/Tyr
 Use sequential residues (1:1 correspondence)
 Minimise torsion-angle differences
 Minimum improvement : (   0.100)
   
 Zone : (          1)
 Fix sidechain of ASP-B-  63 (  177.25 versus     3.25)
 Fix sidechain of PHE-B- 146 (  178.08 versus     2.68)
 Fix sidechain of TYR-B- 167 (  172.27 versus     2.02)
 Fix sidechain of GLU-B- 217 (  178.28 versus     0.91)
 Fix sidechain of TYR-B- 274 (  182.81 versus     1.73)
 Fix sidechain of PHE-B- 280 (  179.83 versus     1.79)
 Fix sidechain of TYR-B- 321 (  173.46 versus     3.15)
   
 Residues checked : (         85)
 Residues fixed   : (          7)
 CPU total/user/sys :       2.7       2.6       0.0
 LSQMAN > ms m1 a m1_chi12_sigma.plt
 ...
 Nr of residues found : (        434)
 SIGMA(chi1) Ave, Sdv, Min, Max :      1.1     1.6     0.0    19.5
 RANGE(chi1) Ave, Sdv, Min, Max :      2.2     3.2     0.0    39.0
 SIGMA(chi2) Ave, Sdv, Min, Max :      1.1     2.7     0.0    32.3
 RANGE(chi2) Ave, Sdv, Min, Max :      2.1     5.4     0.0    64.7
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.15 NMr_model_mode (keep all or only first model when reading NMR ensemble)

By default, when an NMR ensemble is read, LSQMAN will keep all models. However, sometimes you may only want to keep the first model instead. The behaviour can be set with this command. If you only want the first NMR model,

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > nmr
 Select one of the following modes:
 ALl   = keep all NMR models on read
 FIrst = only keep first NMR model on read
 NMR model mode ? (ALl) first
 Only keep first NMR model
 LSQMAN > re m1 pdb3ifb.ent
 ==> Found file in GKPATH : (/portray/pub/databases/pdb/all_entries/uncompr
  essed_files/pdb3ifb.ent)
 [...]
 CRYST1 :     1.000    1.000    1.000  90.00  90.00  90.00 P 1           1
 Multiple NMR models
 NMR model   1 becomes chain A
 Skipping all but first NMR model
 Nr of lines read from file : (       2346)
 Nr of atoms in molecule    : (       1064)
 Nr of chains or models     : (          1)
 Stripped hydrogen atoms    : (       1062)
 Nr of HETATMs              : (          0)
 Stripped alt. conf. atoms  : (          0)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.16 HYdrogens (keep or strip when reading or writing)

By default, hydrogen atoms are STRIPPED when a PDB file is read or written by LSQMAN. You can change this with the HYdrogens command which takes as argument either KE(ep) or ST(rip).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > hy
 Select one of the following modes:
 KEep   = retain hydrogens on read/write
 STrip  = strip  hydrogens on read/write
 Hydrogen mode ? (STrip)
 Strip hydrogens
 LSQMAN > hy kee
 Keep hydrogens
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.17 AA_substitution_matrix

By default, the NWunsch command uses the Blosum45 amino-acid substitution matrix. If you wish to experiment with different matrices (in SBIN-style format), you can read them in with this command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > aa /home/gerard/lib/sbin_blosum60.lib
 Library file with matrix : (/home/gerard/lib/sbin_blosum60.lib)
 Comment : (! BLOSUM 60 matrix made from BLOCKS v. 5.0 and scaled in half-
  bits.)
 Comment : (! ARNDCQEGHILKMFPSTWYVBZX)
 Comment : (#  Matrix made by matblas from blosum60.iij)
 Comment : (#  * column uses minimum score)
 Comment : (#  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units)
 Comment : (#  Blocks Database = /data/blocks_5.0/blocks.dat)
 Comment : (#  Cluster Percentage: >= 60)
 Comment : (#  Entropy =   0.6603, Expected =  -0.4917)
 Comment : (! integer matrix)
 Read INTR matrix with format : ((I2,30I3))
 Average matrix value : (  -1.013)
 Matrix read successfully !
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.18 HEtatm (keep or strip when reading)

By default, hetero atoms (on HETATM cards) are KEPT when a PDB file is read by LSQMAN. You can change this with the HEtatm command which takes as argument either KE(ep) or ST(rip). This mode switch does not influence the way PDB files are written (once read, *ALL* ATOMs and HETATMs will be written as ATOMs on output).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > het
 Select one of the following modes:
 KEep   = retain HETATMs on read
 STrip  = strip  HETATMs on read
 HETATM mode ? (STrip)
 Strip HETATMs
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.19 SUbtract_ave_b (subtract average temperature factor)

Calculation of RMS delta-B values between structures or NCS-related molecules make more sense if you correct for differences in overall temperature factors. This option will calculate the average B for every chain/model of a molecule, and subtract the average from all Bs of non-hydrogen atoms. The example below is for P2 myelin for which the 3 NCS-related molecules have different average Bs. Note that after subtraction, the RSM delta-B is almost zero, but that the correlation coefficient (which is insensitive to offsets and scales) has not changed !
Use this option prior to the MBfactors command as well !!!

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m4 b1-200 m4 c1
 ...
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        131)
 Nr skipped (B limits) : (          0)
 The    131 atoms have an RMS distance of    0.001 A
 RMS delta B  =   14.581 A2
 Corr. coeff. =      0.9975
 ...
 LSQMAN > mb m4 a ../b1.plt
 Multiple chain/model B-factor analysis
 ...
 Nr of residues found : (        131)
 SIGMA(B) Ave, Sdv, Min, Max :      6.5     0.3     5.0     6.6
 RANGE(B) Ave, Sdv, Min, Max :     14.6     0.9    10.6    14.8
 Plot file written
 LSQMAN > su m4
 Subtract average chain B for : (M4)
 Chain A # non-H atoms =   1039 <B> =  29.42 A**2
 Chain B # non-H atoms =   1039 <B> =  27.63 A**2
 Chain C # non-H atoms =   1039 <B> =  42.08 A**2
 LSQMAN > ex m4 b1-200 m4 c1
 ...
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        131)
 Nr skipped (B limits) : (          0)
 The    131 atoms have an RMS distance of    0.001 A
 RMS delta B  =    0.889 A2
 Corr. coeff. =      0.9975
 LSQMAN > mb m4 a ../b2.plt
 ...
 Nr of residues found : (        131)
 SIGMA(B) Ave, Sdv, Min, Max :      0.2     0.3     0.1     1.6
 RANGE(B) Ave, Sdv, Min, Max :      0.5     0.7     0.2     3.8
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.20 ATom_types (select atom types to use in superpositioning)

Select which atom types you want to use during explicit least-squares superpositioning. Note that ONLY the FIRST of these will be used in the improvement steps (in other words, make sure that " CA " is the first atom type if you work with proteins) !
Also note that the atom types should conform to the PDB naming convention (e.g., C-alpha should be entered as " CA ") !
The following options are available:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 * ?  --- list the currently selected atom types
 * CA --- use only C-alpha atoms
 * MA --- use main-chain atoms " N  ", " CA " and " C  "
 * SI --- use all non-hydrogen atoms except N, CA, C, O, OT1, OT1, OTX, OXT
 * EX --- use extended main-chain atoms (N, CA, CB, C, O)
 * PH --- use phosphate " P  " for DNA and RNA molecules
 * DE --- define your own atom types
 * AL --- all atom types
 * NO --- all non-hydrogen atom types
 * TR --- all CA atoms plus all non-hydrogen side-chain atoms
 * PH --- use phosphate " P  " for DNA and RNA molecules
 * C4 --- use sugar " C4*" for DNA and RNA molecules
 * NU --- use all backbone atoms (except OP1 and OP2) for DNA and RNA
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The AL, SI, TR, and NO options only make sense when you are comparing identical molecules, e.g. before and after refinement, or NCS-related molecules. The atoms must have the SAME ORDER in both molecules ! Also, don't forget to reset the atom type to something sensible before using IMprove (e.g., to CA).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > at ca
 Nr of atom types : (       1)
 Type : ( CA)
 LSQMAN > ex m1
...
 Atom types     | CA |
 Nr of atoms to match : (         60)
 The     60 atoms have an RMS distance of    0.782 A
...
 LSQMAN > at def " ca " " n  " " c  " " co " " cb " " cg " " cd "
 Nr of atom types : (       7)
 Type : ( CA)
 Type : ( N)
 Type : ( C)
 Type : ( CO)
 Type : ( CB)
 Type : ( CG)
 Type : ( CD)
 LSQMAN > ex m1
...
 Atom types     | CA | N  | C  | CO | CB | CG | CD |
 Nr of atoms to match : (        265)
 The    265 atoms have an RMS distance of    0.893 A
...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > at nu
 Nr of atom types : (       8)
 Types : (  C4*  P  C1*  C2*  C3*  O2*  O3*  O4*)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.21 SEt (set parameters for operator-improvement algorithm)

Set, list or reset the various parameters for the least-squares improvement option. These are discussed in more detail below.
The following sub-options are available:

* ? --- list the current settings

* RE --- reset the program's default settings (makes the improve option behave like the LSQ_IMPROVE command in O)

* CO --- set suitable parameters for adjusting a very rough initial alignment (e.g., produced by DEJAVU)

* IN --- set parameters for refining an intermediately rough operator

* FI --- set parameters for fine-tuning an operator

* SI --- set parameters for refining an operator between very similar molecules

* MA --- maximum number of improvement cycles

* DI --- maximum distance between matched atoms (as in O)

* DE --- decay factor for the above

* MI --- minimum length of matched fragments (as in O)

* FR --- decay increment for the above

* OP --- the optimisation criterion to be used

* SE --- enforce sequential hits flag

* RM --- weight for the RMS distance in the calculation of the match index

* SH --- frameshift correction flag (used in IMprove and BRute_force); especially useful when you use a very high distance cut-off

* NU --- set reasonable parameters for nucleic acid comparisons

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set re
 Resetting program defaults
 LSQMAN > set ?
 Current parameters:
 (DI) Max matching distance (A) : (   3.500)
 (DE) Decay factor              : (   1.000)
 (MI) Min fragment length (res) : (       3)
 (FR) Fragment length decay     : (       0)
 (MA) Max nr of improve cycles  : (      10)
 (OP) Criterion                 : (CR)
 (RM) RMS weight (MI only)      : (   1.000)
 (SE) Sequential hits only      : (ON)
 (SH) Frameshift correction     : (ON)
 LSQMAN > se opt
 Allowed values: SI/MI/RMs/NMatch/CRippen/RRmsd/
                 NRmsd/S1/S2/S3/S4
 Criterion ? (CR) si
 Criterion : (SI)
 LSQMAN > set dec 0.95
 Decay factor : (   0.950)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set nucleic
 Setting nucleic acid defaults
 LSQMAN > set ?
 Current parameters:
 (DI) Max matching distance (A) : (   4.000)
 (DE) Decay factor              : (   1.000)
 (MI) Min fragment length (res) : (       3)
 (FR) Fragment length decay     : (       0)
 (MA) Max nr of improve cycles  : (      10)
 (OP) Criterion                 : (CR)
 (RM) RMS weight (MI only)      : (   0.500)
 (SE) Sequential hits only      : (OF)
 (SH) Frameshift correction     : (ON)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.22 OMacro (used in macros created by DEJAVU, SPASM, SPANA, etc.)

This command provides an interface to O in that it creates O macro files containing instructions to read, rotate/translate and display one or more molecules.
The following sub-options are available:

* DE --- define the central atom type (default CA, but could be P, C4* or C4' for nucleic acids), the maximum inter-central-atom distance (default 4.5 A, but could be 8.0 A for nucleic acids), and the O connectivity file to use (default all.dat, but could be trna.dat or whatever)

* IN --- select a new reference molecule, close the previous macro file and start a new one

* AP --- add instructions to the macro for a molecule which you have fit on top of the reference molecule defined in the INit step

* WR --- write one or more O commands to the macro file

* CL --- close the current macro file

These commands are used in the LSQMAN input files as produced by DEJAVU, SPASM and SPANA. They are not intended for interactive use (but, you're free to use them anyway, of course ;-).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > om in m1
 File name ? (lsq_m1.omac)
 O macro initialised
 LSQMAN > om ap m2
 O macro extended
 LSQMAN > om wr "print ... I don't like this fit"
 Written to O macro : (print ... I don't like this fit)
 LSQMAN > om close
 O macro file closed
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

13.23 INvert_ncs (invert one or more RT operators)

Use this command to invert one or more O-style Cartesian space RT-operators (e.g., NCS or inter-crystal). Provide the name of the operator file and the name of the output file with the inverted operator(s).

13.24 ALter (manipulate chain and segment IDs)

The ALter commands can be used to change or set chain IDs and segment IDs (as used by X-PLOR and CNS) from within the program. You have the following options:

- ALter CHain mol chain new_chain = set the chain ID of a particular chain to a new value (e.g., to change 'A' into 'B'); if you use the same values for chain and new_chain, you will effectively see a count of the number of atoms with that chain ID
- ALter SEgid mol segid new_segid = set the segment ID of a particular segment to a new value (e.g., to change 'AAAA' into 'PROT'); if you use the same values for segid and new_segid, you will effectively see a count of the number of atoms with that segment ID
- ALter SAme mol chain = set the segment ID of a particular chain to be the same as its chain ID (right-padded with blanks)
- ALter FOrce mol chain new_segid = set the segment ID of a particular chain to a new value (e.g., when you have read a PDB file without segment IDs)
- ALter REnumber mol chain [first] = renumber the residues of a particular chain, starting at 1 (or the value of "first")

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1pmp.pdb
 ==> Found file in GKPATH : (/nfs/pdb/full/1pmp.pdb)
 HEADER :     CELLULAR LIPOPHILIC TRANSPORT PROTEIN   10-FEB-93   1PMP      1PMP   2
 AUTHOR :     S.W.COWAN,M.E.NEWCOMER,T.A.JONES                              1PMP   5
 REVDAT :    1   26-JAN-95 1PMP    0                                        1PMP   6
 CRYST1 :    91.800   99.500   56.500  90.00  90.00  90.00 P 21 21 21   12  1PMP 178
 Old chain |A| becomes chain A
 Old chain |B| becomes chain B
 Old chain |C| becomes chain C
 Nr of lines read from file : (       3440)
 Nr of atoms in molecule    : (       3192)
 Nr of chains or models     : (          3)
 Stripped hydrogen atoms    : (          0)
 Nr of HETATMs              : (         75)
 Stripped alt. conf. atoms  : (          0)
 LSQMAN > al ch m1 a x
 Chain ID to alter : (A)
 New chain ID      : (X)
 Nr of atoms changed : (       1064)
 LSQMAN > al ch m1 b b
 Chain ID to alter : (B)
 New chain ID      : (B)
 Nr of atoms changed : (       1064)
 LSQMAN > al fo m1 c zzzz
 Chain to alter : (C)
 New segment ID : (ZZZZ)
 Nr of atoms changed : (       1064)
 LSQMAN > al sa m1 b
 Chain to alter : (B)
 New segment ID : (B)
 Nr of atoms changed : (       1064)
 LSQMAN > li m1
   
 List    : (M1)
 File    : (1pmp.pdb)
 Comment : (Read from 1pmp.pdb)
 Cell    : (  91.800   99.500   56.500   90.000   90.000   90.000)
 Nr of atoms in mol  : (       3192)
 Multiple NMR models ? (F)
 Nr of chains/models : (          3)
 Chain/Model #  1 - Name |X| Nr of atoms     1064
 Chain/Model #  2 - Name |B| Nr of atoms     1064
 Chain/Model #  3 - Name |C| Nr of atoms     1064
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Save the PDB file and check that the changes were made:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 % 577 gerard sarek 15:10:43 gerard/junk > grep trp q.pdb | grep '  8 ' | grep ' CA '
ATOM     55  CA  TRP X   8      39.338  59.336  29.583  1.00 21.80      1PMP
ATOM   1120  CA  TRP B   8      59.783  31.997  32.869  1.00 21.80      B
ATOM   2185  CA  TRP C   8      25.458  54.801  30.571  1.00 21.80      ZZZZ
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14 SUPERIMPOSING AND COMPARING TWO MOLECULES

14.1 EXplicit (explicit superpositioning of two molecules)

Do an explicit least-squares superposition (as LSQ_EXPLICIT in O). You must supply:
* the name of the molecule that is to be kept fixed
* one or more residue ranges
* the name of the molecule that is to be rotated/translated
* the first residue of each zone corresponding to the zones entered for the fixed molecule
Note that by using the ATom_types commands you can perform this fit using any type(s) of atom !
From version 3.0 onwards, the RMS difference between and the linear correlation coefficient of the temperature factors (Bs) of the matched atoms are also shown.
From version 3.2.2 onwards, a B-factor limit may be imposed with the BF command.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m1
 Range 1 ? (A1-10) "a4-10 a19-23 a28-36 a44-51 a53-66 a91-97 a106-111 a123:126"
 Mol 2 ? (M1) m2
 Range 2 ? (A1) "a4 a19 a28 a44 a53 a91 a106 a123"
 Explicit fit of M1 "A4-10 A19-23 A28-36 A44-51 A53-66 A91-97 A106-111 A123:126"
 And             M2 "A4 A19 A28 A44 A53 A91 A106 A123"
 Atom types     | CA | N  | C  | O  | CB |
 Nr of atoms to match : (        295)
 The    295 atoms have an RMS distance of    0.892 A
 Rotation    :  -0.956932  0.127723 -0.260706
                 0.170532 -0.479456 -0.860837
                -0.234946 -0.868222  0.437026
 Translation :     13.787    26.800    38.541
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14.2 BRute_force (find alignment of two molecules automagically)

With this command you can undertake a systematic attempt to align two molecules for which you don't know exactly which residue numbers will match, for example:
- comparing a protein from two different organisms
- comparing mutants
- comparing distantly related proteins
- comparing a partial model (e.g., a domain) to a complete one

You have to supply the molecule and chain names to be aligned, as well as values for the following parameters:
- length of fragments the program will attempt to match (e.g., 50 or 30)
- step size for trying different fragments (typically, about half the fragment length)
- the minimum number of matched residues between the two chains that would make you happy (e.g., 100); you may also supply a real number between zero and one, and that will then be taken as the minimum *fraction* of the residues of the smaller of the two molecules that has to be matched (e.g., "0.8" means: match at least 80% of the residues of the shortest molecule)
- an optional parameter which tells the program if the two molecules are just models of the same protein (value S for Same), or if they are different proteins alltogether (value D for Different, default). In the former case, only matches of identical stretches of residues will be tried, making the matching process an order of magnitude faster. (Normally, EXplicit + IMprove is good enough in case like this, but not when you are evaluating some of the models submitted to CASP3 ...)
- the slide-step size (from version 9.7.2 - see below)

The algorithm is very simple. Suppose molecule 1 is numbered from 1 to 236, and molecule 2 from 36 to 85. If you use a fragment length of 50 residues and a step size of 10, the following will happen:
- the program will do an explicit superpositioning of residues 1 to 50 in molecule 1 and 36-85 in molecule 2. If the RMSD is less than 10 A, it will subsequently attempt to improve the alignment.
- when it's done, it will align 1 to 50 with 37-86 in molecule 2, and improve the alignment if possible, etc.
- in this way the fragment "1 to 50" of molecule slides over the entire sequence of molecule 2; for each alignment the RMSD is calculated, and the alignment is improved upon if possible.
- whenever an alignment leads to a larger number of matched residues than previously obtained, the alignment will be stored as the current best one
- when this is done, the program will "jump" 10 residues (the step size), and now attempt to align residues 11 to 60 of molecule 1 with molecule 2 36-85, 37-86, ....

If at any stage the number of matched residues exceeds the minimum number you said would make you happy, the operation stops and the alignment is stored as the current best operator bringing molecule 2 on top of molecule 1. To see which residues are matched, do an IMprove molecule1 molecule2.

The default values for the three parameters usually work well. If the similarity between the two molecules is very small, you can use a smaller value for the fragment length (e.g., 30 instead of 50).

If you want to do a more thorough search (slow !), use a small value for the step size, and a large value for the minimum number of residues to be matched.

In difficult cases, use a large number for the minimum number of residues to be matched (at least 100). The rationale for this is that for two large structures, there are often "false minima" involving a respectable number of matched residues. For example, aligning 1LTE to 1CEL with a value of 50 gives an incorrect solution; using a value of 100 gives the correct solution.

Note that you can use all the parameters that you can use for the EXplicit and IMprove commands (e.g., which atoms to use in the alignment, so it should also work for DNA, RNA, sugars, ...).

From version 9.0, residues with zero or negative residue numbers are ignored (previously, the command would simply fail in such cases).

From version 9.7.2, you can also change the size of the steps along the other sequences (previously fixed at 1) through the optional slide_step parameter (default = 1). In most cases you can easily use a value of 2 (giving a two-fold speed-up), and when you're comparing two very big molecules you can try values of 5 or more.

Example 1 - matching P2 myelin (1PMP) to cellular retinol-binding protein (1CBR). This is very easy, since the two proteins are structurally very similar, and the correct alignment is found in the first trial:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 pdb1pmp.ent
 LSQMAN > re m2 pdb1crb.ent
 LSQMAN > brute m2 a m1 a 50 25 0.5 d 2
 Brute-force fit of M2 A
 And                M1 A
 Atom types     | CA |
 B-factor range used  -1000.00 - 10000.00 A2
 Fragment length            50
 Fragment step size         25
 Sliding step size           2
 Mol 1 zone to try : (A1-373)
 Mol 2 zone to try : (A1-205)
 Min fraction matched     0.50
 Min matched residues      103
   
 Try zone : (A1-50)
 Max match so far : (        121)
 RMSD (A)         : (   1.169)
   
 Number of trials : (          1)
 Number IMproved  : (          1)
   
 Max match : (        121)
 RMSD (A)  : (   1.169)
 Mol 1 res : (          1)
 Mol 2 res : (          1)
 Regenerating best alignment ...
 The    121 atoms have an RMS distance of    1.169 A
 SI = RMS * Nmin / Nmatch             =      1.26551
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.42613
 CR = Maiorov-Crippen RHO (0-2)       =      0.08563
 RR = relative RMSD                   =      0.08117
 NR = normalised RMSD (100)           =      1.067 A
 SAS(1) = cRMS * (100/Nmatch)         =      0.966 A
 SAS(2) = cRMS * (100/Nmatch)^2       =      0.798 A
 SAS(3) = cRMS * (100/Nmatch)^3       =      0.660 A
 SAS(4) = cRMS * (100/Nmatch)^4       =      0.545 A
 RMSD / Nalign                        =    0.00966 A
 RMS delta B for matched atoms        =    19.039 A2
 Corr. coefficient matched atom Bs    =        0.500
 Rotation     :   0.68362272  0.64582211  0.33996150
                 -0.71084732  0.69475222  0.10961552
                 -0.16539684 -0.31659639  0.93402928
 Translation  :     -71.6204    -13.8609     -6.1448
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Example 2 - matching cellular retinol-binding protein (1CRB) to the serum retinol-binding protein (1RBP). This is more difficult, since, although the proteins are related, their structures are rather different (10- versus 8-stranded beta-barrel; ~130 versus ~170 residues). Therefore, be a bit more conservative with the choice of parameters:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m3 pdb1rbp.ent
 LSQMAN > brute m2 a m3 a 30 15 100 d 2
 Brute-force fit of M2 A
 And                M3 A
 Atom types     | CA |
 B-factor range used  -1000.00 - 10000.00 A2
 Fragment length            30
 Fragment step size         15
 Sliding step size           2
 Mol 1 zone to try : (A1-373)
 Mol 2 zone to try : (A1-330)
 Min matched residues      100
   
 Try zone : (A1-30)
 Max match so far : (          3)
 RMSD (A)         : (   0.810)
 Max match so far : (          6)
 RMSD (A)         : (   1.360)
 Max match so far : (         12)
 RMSD (A)         : (   2.121)
 Max match so far : (         25)
 RMSD (A)         : (   2.145)
 Try zone : (A16-45)
 Try zone : (A31-60)
 Max match so far : (         59)
 RMSD (A)         : (   1.514)
 Max match so far : (         65)
 RMSD (A)         : (   1.548)
 Try zone : (A46-75)
 Try zone : (A61-90)
 Try zone : (A76-105)
 Try zone : (A91-120)
 Try zone : (A106-135)
 Skip - missing residue(s) in zone
 Try zone : (A121-150)
 Skip - missing residue(s) in zone
 Try zone : (A136-165)
 Skip - missing residue(s) in zone
 Try zone : (A151-180)
 Skip - missing residue(s) in zone
 Try zone : (A166-195)
 Skip - missing residue(s) in zone
 Try zone : (A181-210)
 Skip - missing residue(s) in zone
 Try zone : (A196-225)
 Skip - missing residue(s) in zone
 Try zone : (A211-240)
 Skip - missing residue(s) in zone
 Try zone : (A226-255)
 Skip - missing residue(s) in zone
 Try zone : (A241-270)
 Skip - missing residue(s) in zone
 Try zone : (A256-285)
 Skip - missing residue(s) in zone
 Try zone : (A271-300)
 Skip - missing residue(s) in zone
 Try zone : (A286-315)
 Skip - missing residue(s) in zone
 Try zone : (A301-330)
 Skip - missing residue(s) in zone
 Try zone : (A316-345)
 Skip - missing residue(s) in zone
 Try zone : (A331-360)
 Skip - missing residue(s) in zone
   
 Number of trials : (        511)
 Number IMproved  : (        243)
   
 Max match : (         65)
 RMSD (A)  : (   1.548)
 Mol 1 res : (         31)
 Mol 2 res : (         35)
 Regenerating best alignment ...
 The     65 atoms have an RMS distance of    1.548 A
 SI = RMS * Nmin / Nmatch             =      3.19107
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.19188
 CR = Maiorov-Crippen RHO (0-2)       =      0.13252
 RR = relative RMSD                   =      0.13664
 NR = normalised RMSD (100)           =      1.973 A
 SAS(1) = cRMS * (100/Nmatch)         =      2.381 A
 SAS(2) = cRMS * (100/Nmatch)^2       =      3.664 A
 SAS(3) = cRMS * (100/Nmatch)^3       =      5.636 A
 SAS(4) = cRMS * (100/Nmatch)^4       =      8.671 A
 RMSD / Nalign                        =    0.02381 A
 RMS delta B for matched atoms        =    12.597 A2
 Corr. coefficient matched atom Bs    =        0.176
 Rotation     :  -0.10132343 -0.74412698  0.66030955
                  0.99473053 -0.06534031  0.07900542
                 -0.01564524  0.66483516  0.74682629
 Translation  :       2.4224    -29.7841    -51.1347
 CPU total/user/sys :       3.8       3.8       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Example 3 - a tough case is matching 1CEL and 2AYH. Both proteins have a similar core fold but they are not so easy to align. With conservative parameters, the program finds the correct alignment fairly quickly though:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 pdb1cel.ent
 LSQMAN > re m2 pdb2ayh.ent
 LSQMAN > br m1 a m2 a 30 10 100 d 2
 Brute-force fit of M1 A
 And                M2 A
 Atom types     | CA |
 B-factor range used  -1000.00 - 10000.00 A2
 Fragment length            30
 Fragment step size         10
 Sliding step size           2
 Mol 1 zone to try : (A1-764)
 Mol 2 zone to try : (A1-417)
 Min matched residues      100
   
 Try zone : (A1-30)
 Max match so far : (          9)
 RMSD (A)         : (   2.020)
 Max match so far : (         10)
 RMSD (A)         : (   2.072)
 Max match so far : (         13)
 RMSD (A)         : (   1.934)
 Max match so far : (         16)
 RMSD (A)         : (   2.196)
 Try zone : (A11-40)
 Max match so far : (         18)
 RMSD (A)         : (   1.903)
 Max match so far : (         23)
 RMSD (A)         : (   1.926)
 Max match so far : (         29)
 RMSD (A)         : (   2.377)
 Try zone : (A21-50)
 Try zone : (A31-60)
 Try zone : (A41-70)
 Try zone : (A51-80)
 Try zone : (A61-90)
 Try zone : (A71-100)
 Try zone : (A81-110)
 Max match so far : (         42)
 RMSD (A)         : (   2.177)
 Max match so far : (        128)
 RMSD (A)         : (   1.635)
   
 Number of trials : (        762)
 Number IMproved  : (        308)
   
 Max match : (        128)
 RMSD (A)  : (   1.635)
 Mol 1 res : (         81)
 Mol 2 res : (         35)
 Regenerating best alignment ...
 The    128 atoms have an RMS distance of    1.635 A
 SI = RMS * Nmin / Nmatch             =      2.73372
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.22769
 CR = Maiorov-Crippen RHO (0-2)       =      0.11465
 RR = relative RMSD                   =      0.10818
 NR = normalised RMSD (100)           =      1.455 A
 SAS(1) = cRMS * (100/Nmatch)         =      1.277 A
 SAS(2) = cRMS * (100/Nmatch)^2       =      0.998 A
 SAS(3) = cRMS * (100/Nmatch)^3       =      0.780 A
 SAS(4) = cRMS * (100/Nmatch)^4       =      0.609 A
 RMSD / Nalign                        =    0.01277 A
 RMS delta B for matched atoms        =     5.880 A2
 Corr. coefficient matched atom Bs    =        0.462
 Rotation     :  -0.00959781 -0.99757552 -0.06892973
                  0.78710389  0.03497919 -0.61582816
                  0.61674607 -0.06016544  0.78485972
 Translation  :      47.7106     52.9018     47.2431
 CPU total/user/sys :      11.2      11.1       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14.3 FAst_force (find alignment of two molecules automagically)

Does the same as the BRute_force command (quod vide), but cuts a few corners to get the job done a tad faster. Amongst other things, it uses only the first atom type (so make sure that this is something sensible, such as CA or C4* !).

The maximum number of trial superpositionings that the program will generate is roughly equal to: (N1-L)*(N2-L)/(S1*S2), where N1 and N2 are the numbers of residues in the first and second molecule (chain), respectively, S1 is the fragment length, and S2 is the slide-step parameter.

From version 9.7.2, the (maximum) total number of trials is printed, and -if this number exceeds 10,000- a message is printed when 10%, 20%, etc. of the trials has been completed.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 pdb1cel.ent
 LSQMAN > re m2 pdb2ayh.ent
 LSQMAN > fa m1 a m2 a 30 10 100 d 2
 Fast-force fit of  M1 A
 And                M2 A
 Atom type      | CA |
 Fragment length            30
 Fragment step size         10
 Sliding step size           2
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        214)
 Min matched residues      100
 Max number of trials : (       3813)
 Max match so far : (          9)
 RMSD (A)         : (   2.020)
 Max match so far : (         10)
 RMSD (A)         : (   2.072)
 Max match so far : (         13)
 RMSD (A)         : (   1.934)
 Max match so far : (         16)
 RMSD (A)         : (   2.196)
 Max match so far : (         18)
 RMSD (A)         : (   1.903)
 Max match so far : (         23)
 RMSD (A)         : (   1.926)
 Max match so far : (         29)
 RMSD (A)         : (   2.377)
 Max match so far : (         42)
 RMSD (A)         : (   2.177)
 Max match so far : (        128)
 RMSD (A)         : (   1.635)
   
 Number of trials : (        762)
 Number IMproved  : (        308)
   
 Max match : (        128)
 RMSD (A)  : (   1.635)
 Regenerating best alignment ...
 The    128 atoms have an RMS distance of    1.635 A
 SI = RMS * Nmin / Nmatch             =      2.73372
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.22769
 CR = Maiorov-Crippen RHO (0-2)       =      0.11465
 RR = relative RMSD                   =      0.10818
 NR = normalised RMSD (100)           =      1.455 A
 SAS(1) = cRMS * (100/Nmatch)         =      1.277 A
 SAS(2) = cRMS * (100/Nmatch)^2       =      0.998 A
 SAS(3) = cRMS * (100/Nmatch)^3       =      0.780 A
 SAS(4) = cRMS * (100/Nmatch)^4       =      0.609 A
 RMSD / Nalign                        =    0.01277 A
 RMS delta B for matched atoms        =     5.880 A2
 Corr. coefficient matched atom Bs    =        0.462
 Rotation     :  -0.00959781 -0.99757552 -0.06892973
                  0.78710389  0.03497919 -0.61582816
                  0.61674607 -0.06016544  0.78485972
 Translation  :      47.7106     52.9018     47.2431
 CPU total/user/sys :      10.0      10.0       0.1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14.4 NWunsch (sequence-based alignment of two structures)

If two structures have very similar sequences, the sequence- and structure-based alignments will be very similar. This is used here to do an "ab initio" superpositioning based on sequence-derived equivalences of residues. The residues that are matched are used to calculate an operator. (Note: the substitution matrix used is BLOSUM-45, unless you read in another matrix with the AA_substitution_matrix command.)

For example, reading in 1CEL and 1EG1, NWunsch yields:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > nw m1 a m2 a 5
 Sequence-based Needleman-Wunsch alignment
 Of              M1 A
 And             M2 A
 Atom type      | CA |
 Gap penalty         5.00
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        371)
   
 Executing Needleman-Wunsch ...
   
 Sequence 1 ?SACTLQSETHPPLTWQKCS-SGGTC-TQQTGSVVIDANWRWTHATNSSTNC-YDGNTWS
    |=ID        |   | || ||  ||+ ||| |  |+| |||+| |+|| |  | ++ |  +|   +
 Sequence 2 ?QPGTSTPEVHPKLTTYKCTKSGG-CVAQDT-SVVLDWNYRWMHDANYNS-CTVNGGV-N
   
 Sequence 1 STLCPDNETCAKNCCLDGAAYASTYGVTTSGNSLSIG-FV-TQS-AQKNVGARLYLMASD
    |=ID    +|||||  || ||| + |  ||++ ||||||+||++  ++ + |    +|  ||||+ ||
 Sequence 2 TTLCPDEATCGKNCFIEGVDYAAS-GVTTSGSSLTMNQYMPSSSGGYSSVSPRLYLLDSD
   
 Sequence 1 TTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTGYCDS
    |=ID      |  + | | |+|||||+| |||| ||+||+  || +|| ++|  ||||| ||+||||+
 Sequence 2 GEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGANQY--NTAGANYGSGYCDA
   
 Sequence 1 QCPRDLKFINGQANVEGWEPSSNNANTGIGGH-GSCCSEMDIWEANSISEALTPHPCTTV
    |=ID    |||     +  |     |+  +   ||    | | ||+|||| | || + ||||| ||
 Sequence 2 QCP-----V--Q--T--WR--NGTLNT---SHQGFCCNEMDILEGNSRANALTPHSCTAT
   
 Sequence 1 GQEICEGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTKKLTVVTQ
    |=ID        |   || | +  | | |+    |     |+    |+||||   |+||+| +|++||
 Sequence 2 A---CDSAGC-G-F--NPY-GS----G-----YK----SYYGPG-D-TVDTSKTFTIITQ
   
 Sequence 1 FET-----SG---AINRYYVQNGVTFQQPNAELGSYSGNELNDDYCTAEEAEFGGSSFSD
    |=ID    | |     ||   +| | | ||||  + |+|+ |   |+ ++   | +  |    |++
 Sequence 2 FNTDNGSPSGNLVSITRKYQQNGV--DIPSAQPG---GDTIS-S-CPS--A----SAY--
   
 Sequence 1 KGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGSCSTSSGVPA
    |=ID     |||    || | ||||| |+|+|    | ||||    |       |  | ||++ | |+
 Sequence 2 -GGLATMGKALSSGMVLVFSIWNDNSQYMNWLDS---GN-------A--GPCSSTEGNPS
   
 Sequence 1 QVESQSPNAKVTFSNIKFGPIGSTGNPSG
    |=ID     + + +||  | ||||++| ||||   +
 Sequence 2 NILANNPNTHVVFSNIRWGDIGST---T-
   
 Gap penalty         : (   5.000)
 Raw alignment score : (  8.370E+02)
 Length sequence 1   : (     434)
 Length sequence 2   : (     371)
 Alignment length    : (     449)
 Nr of identities    : (     190)
 Perc identities     : (  51.213)
 Nr of similarities  : (     255)
 Perc similarities   : (  68.733)
 Nr of matched residues : (        356)
 RMSD (A) for those : (   6.038)
 Operator stored : (  -0.378    0.841   -0.386   -0.907   -0.421   -0.029
   -0.187    0.339    0.922   93.902  -36.155   11.547)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

This gives 356 aligned residues with an RMSD (on CA atoms) of more than 6 A. However, a simple IMprove will give you the correct alignment of the structures:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > im m1 a* m2 a*
 Improve fit of  M1 A*
 And             M2 A*
 Atom type      | CA |
 Nr of atoms in mol1 : (        434)
 Nr of atoms in mol2 : (        371)
   
 [...]
   
          THR-A 429 <===> THR-A 370 @     1.00 A *
          GLY-A 430 <===> THR-A 371 @     1.37 A
   
 Nr of residues in mol1   : (     434)
 Nr of residues in mol2   : (     371)
 Nr of matched residues   : (     315)
 Nr of identical residues : (     160)
 % identical of matched   : (  50.794)
 % matched   of mol1      : (  72.581)
 % identical of mol1      : (  36.866)
 D-value    for mol1      : (   0.268)
 % matched   of mol2      : (  84.906)
 % identical of mol2      : (  43.127)
 D-value    for mol2      : (   0.366)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

And subsequently the GLobal_nw command gives us the structure-based sequence alignment:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > gl m1 a m2 a 5
 Global-superposition-distance-based Needleman-Wunsch alignment
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Cut-off distance     5.00
 Central atoms mol 1 : (        434)
 Central atoms mol 2 : (        371)
   
 Applying current operator to mol 2 : (  -0.398    0.810   -0.431   -0.898
    -0.440    0.002   -0.188    0.388    0.902   96.201  -36.992   17.480)
   
 Calculating superposition-distance matrix ...
   
 Executing Needleman-Wunsch ...
   
      1 ?   ?  DIST =     2.31 A
      2 S   Q  DIST =     1.88 A
      3 A   P  DIST =     2.09 A
      4 C   G  DIST =     1.12 A
      5 T | T  DIST =     1.51 A
   
 [...]
   
    452 G | G  DIST =     1.66 A
    453 S | S  DIST =     0.98 A
    454 T | T  DIST =     1.00 A
    455 G   T  DIST =     1.37 A
    456 N   -  DIST =        -
    457 P   -  DIST =        -
    458 S   -  DIST =        -
    459 G   -  DIST =        -
   
 Sequence 1 ?SACTLQSETHPPLTWQKCSSGGTCTQQTGSVVIDANWRWTHATNSSTNCYDGN-TW-SS
    |=ID        |   | || ||  ||   | |  |  ||| | | || |       |   |
 Sequence 2 ?QPGTSTPEVHPKLTTYKCTKSGGCVAQDTSVVLDWNYRWMHD-ANYNSCTV-NG-GVNT
   
 Sequence 1 TLCPDNE--TCAK-NCCLDGAAYASTYGVTTSGNSLSIGFVTQSA----QKNVGARLYLM
    |=ID    |||      |    ||   |  ||   |||||| ||       |        |  ||||
 Sequence 2 TLC-P-DEAT-CGKNCFIEGVDYA-ASGVTTSGSSLTMNQYMPSSSGGY-SSVSPRLYLL
   
 Sequence 1 AS-DTTYQEFTLLGNEFSFDVDVSQLPCGLNGALYFVSMDADGGVSKYPTNTAGAKYGTG
    |=ID     |    |    | | | ||||| | |||| || ||   ||  ||      ||||| || |
 Sequence 2 DSD-GEYVMLKLNGQELSFDVDLSALPCGENGSLYLSQMDENGGA--NQYNTAGANYGSG
   
 Sequence 1 YCDSQCPRDLKFIN-GQAN--VEGWEPSSNNANTGIGGHGSCCSEMDIWEANSISEALTP
    |=ID    ||| |||      |    |                    | || |||| | ||   ||||
 Sequence 2 YCDAQCPV-QTWRNGTL-NTS----------------HQGFCCNEMDILEGNSRANALTP
   
 Sequence 1 HPCTT-VGQEICEGDGCGGTYSDNRYGGTCDPDGCDWNPYRLGNTSFYGPGSSFTLDTTK
    |=ID    | ||                         ||  ||  |||  |  | ||||   | || |
 Sequence 2 HSCTAT----------------------ACDSAGCGFNPYGSGYKSYYGPG--DTVDTSK
   
 Sequence 1 KLTVVTQFETS-------G-AINRYYVQNGVTFQQPN-AELGSYSG-NELNDDYCTAEEA
    |=ID      |  ||| |           | | | ||||              |        |
 Sequence 2 TFTIITQFNTDNGSPSGNLVSITRKYQQNGVDIPSAQP-------GGDTISS--CP----
   
 Sequence 1 EFGGSSFSDKGGLTQFKKATSGGMVLVMSLWDDYYANMLWLDSTYPTNETSSTPGAVRGS
    |=ID         | |  |||    || | ||||| | | |    | ||||               |
 Sequence 2 -----SASAYGGLATMGKALSSGMVLVFSIWNDNSQYMNWLDSG------------NAGP
   
 Sequence 1 CSTSSGVPAQVESQ-SPNAKVTFSNIKFGPIGSTGNPSG
    |=ID    ||   | |        ||  | ||||  | ||||
 Sequence 2 CSSTEGNPSNI-LANNPNTHVVFSNIRWGDIGSTT----
   
 Gap penalty         : (  12.500)
 Raw alignment score : ( -2.315E+03)
 Length sequence 1   : (     434)
 Length sequence 2   : (     371)
 Alignment length    : (     459)
 Nr of identities    : (     169)
 Perc identities     : (  45.553)
 Nr of matched res   : (     346)
 RMSD (A) for those  : (   1.606)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14.5 XAlignment (apply external sequence alignment)

You may want to investigate how good a sequence alignment obtained with some external program is in terms of the 3D structure. In that case, you can use this command to read in a sequence alignment and apply it to two structures.

The sequence alignment is provided through an external file. In this file, the first two sequences (in PIR format, with insertions marked by dashes (---) or dots (...), and unknown or unusual amino acids as question marks (???) or X-s (XXX) are assumed to be those of your two molecules. For instance, the following alignment was obtained for 1CRB and 1RBP using Indonesia with the Gonnet substitution matrix and default gap parameters:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- >1crb__;P1 TITLE ------------------------------------PVD--FNGYWKMLS---NENFEEY LRALDVNVALRKIANLLKPDKEIVQDGD-HMIIRTLSTF-RNYIMDFQVGKEFEEDLTGI DDRKCMTTVSWDGDKLQCVQKGEKEGRGWTQWIEGDE-LHLEMRAEGVTC----KQVFKK VH * >1rbp__;P1 TITLE ERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGR VR-LLNNWDVC--ADMVGTFTDTEDPAKFKMKYWGVASFLQKGNDDHWI-VDTDYDTY-A VQYSCRL-LNLDGTCADSYSFVFSRDPN-GLPPEAQKIVRQRQEELCLARQYRLIVHNGY C- *

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The program checks that the sequences correspond to those it found in the PDB files of your molecules, and it will bail out at the first sign of incongruity !

To see how good the alignment is, you could issue the following set of commands in LSQMAN:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > read m1 pdb1crb.ent
[...]
 LSQMAN > read m2 pdb1rbp.ent
[...]
 LSQMAN > xali m1 A m2 A 1crb_1rbp.pir
 External alignment
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Alignment file   1crb_1rbp.pir
 Central atoms mol 1 : (        134)
 Central atoms mol 2 : (        174)
   
 Reading external alignment ...
 Found sequence : (>1crb__;P1)
 Title : (TITLE)
 Length (incl. gaps) : (        182)
 Found sequence : (>1rbp__;P1)
 Title : (TITLE)
 Length (incl. gaps) : (        182)
   
 Sequences deduced from PDB files:
   
 Sequence 1 PVDFNGYWKMLSNENFEEYLRALDVNVALRKIANLLKPDKEIVQDGDHMIIRTLSTFRNY
 Sequence 1 IMDFQVGKEFEEDLTGIDDRKCMTTVSWDGDKLQCVQKGEKEGRGWTQWIEGDELHLEMR
 Sequence 1 AEGVTCKQVFKKVH
   
 Sequence 2 ERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGR
 Sequence 2 VRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSC
 Sequence 2 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYC
   
 Sequences read from alignment file:
   
 Sequence 1 ------------------------------------PVD--FNGYWKMLS---NENFEEY
    |=ID                                          |
 Sequence 2 ERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGR
   
 Sequence 1 LRALDVNVALRKIANLLKPDKEIVQDGD-HMIIRTLSTF-RNYIMDFQVGKEFEEDLTGI
    |=ID     | |  |      |                |       |      |         |
 Sequence 2 VR-LLNNWDVC--ADMVGTFTDTEDPAKFKMKYWGVASFLQKGNDDHWI-VDTDYDTY-A
   
 Sequence 1 DDRKCMTTVSWDGDKLQCVQKGEKEGRGWTQWIEGDE-LHLEMRAEGVTC----KQVFKK
    |=ID        |      ||                    |
 Sequence 2 VQYSCRL-LNLDGTCADSYSFVFSRDPN-GLPPEAQKIVRQRQEELCLARQYRLIVHNGY
   
 Sequence 1 VH
    |=ID
 Sequence 2 C-
   
 Checking integrity of sequences ...
 PDB and alignment-file sequences identical !
 Nr of aligned residues : (        126)
   
 Length sequence 1   : (     134)
 Length sequence 2   : (     174)
 Alignment length    : (     182)
 Nr of identities    : (      13)
 Perc identities     : (   9.701)
   
 Nr of aligned residues : (        126)
 RMSD (A) for those     : (  14.778)
 Operator stored        : (  -0.010    0.542    0.840   -0.087   -0.838
  0.539    0.996   -0.067    0.056  -28.793    7.657  -42.583)
   
 The    126 atoms have an RMS distance of   14.778 A
 SI = RMS * Nmin / Nmatch             =     15.71633
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.05962
 CR = Maiorov-Crippen RHO (0-2)       =      1.16053
 Estimated RMSD for 2 random proteins =     15.597 A
 RR = Relative RMSD                   =      0.94751
 NR = Normalised RMSD (100)           =     13.247 A
 LSQMAN > global m1 A m2 A 3.5 q.q
 Global-superposition-distance-based Needleman-Wunsch alignment
 Of                   M1 A
 And                  M2 A
 Atom type           | CA |
 Cut-off distance         3.50
 Log file (optional)  q.q
   
 Central atoms mol 1 : (        134)
 Central atoms mol 2 : (        174)
   
 Applying current operator to mol 2 : (  -0.010    0.542    0.840   -0.087
    -0.838    0.539    0.996   -0.067    0.056  -28.793    7.657  -42.583)
   
 Calculating superposition-distance matrix ...
   
 Executing Needleman-Wunsch ...
   
      1 -   E  DIST =        -
      2 -   R  DIST =        -
[...]
    278 V   -  DIST =        -
    279 H   -  DIST =        -
   
 Sequence 1 --------------PVD-FNG--YW-KM---------------LSNENF-----------
 .=ALI |=ID                 .  ..   .  .                    .
 Sequence 2 ERDCRVSSFRVKEN--FD-KARF-SG-TWYAMAKKDPEGLFLQ-----DNIVAEFSVDET
   
 Sequence 1 ---------E-EYLRALDVN----VALRKIA-----------NLLKPDKEIVQD-GDHMI
 .=ALI |=ID          .        ..    . .  .|                      .    ..
 Sequence 2 GQMSATAKGRV-------RLLNNWD-V--CADMVGTFTDTED-----------PA---KF
   
 Sequence 1 ------IRTLSTF--------RNYIMD-----------------FQVGKEFEE-------
 .=ALI |=ID             .            .|                         .
 Sequence 2 KMKYWG------VASFLQKGN----DDHWIVDTDYDTYAVQYSC--------RLLNLDGT
   
 Sequence 1 --DLTGIDDRKCMTTVSWDGDKLQCVQ-KGEKEGRGWTQWIEGDELHL------------
 .=ALI |=ID            .              .        ...         .
 Sequence 2 CA---------D--------------SY-------SFV---------FSRDPNGLPPEAQ
   
 Sequence 1 ----------------EMRAEGVTC------KQVFKKVH
 .=ALI |=ID                      . ..
 Sequence 2 KIVRQRQEELCLARQY-----R-LIVHNGYC--------
   
 Analysis of distance distribution:
 Number of distances                    :         29
 Average (A)                            :       2.16
 Standard deviation (A)                 :       0.81
 Variance (A**2)                        :       0.66
 Minimum (A)                            :       0.64
 Maximum (A)                            :       3.42
 Range (A)                              :       2.78
 Sum (A)                                :      62.69
 Root-mean-square (A)                   :       2.31
 Harmonic average (A)                   :       1.77
 Median (A)                             :       2.21
 25th Percentile (A)                    :       1.69
 75th Percentile (A)                    :       2.72
 Semi-interquartile range (A)           :       1.04
 Trimean (A)                            :       2.21
 50% Trimmed mean (A)                   :       2.13
 10th Percentile (A)                    :       0.87
 90th Percentile (A)                    :       3.26
 20% Trimmed mean (A)                   :       2.13
   
 Gap penalty            :        6.125
 Raw alignment score    :  -1.6860E+03
 L1 = Length sequence 1 :          134
 L2 = Length sequence 2 :          174
 Alignment length       :          279
 NI = Nr of identities  :            2
 L3 = Nr of matched res :           29
 RMSD for those (A)     :        2.278
 ID = NI/min(L1,L2) (%) :         1.49
 ID = NI/L3 (%)         :         6.90
   
 Levitt-Gerstein statistics:
 Nr of gaps       :           20
 Similarity score :   2.8417E+02
 Z-score          :   2.5478E+00
 P (z > Z)        :   7.5270E-02
 P (z > Z) is the probability of matching any two
 random structures and finding a Z-score z which
 is greater than the Z-score Z of the current pair.
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Obviously, this is not a very good alignment. Although 126 residues were aligned in the sequence alignment, applying the corresponding operator results in only 29 matched residues (within 3.5 Å of each other) !

14.6 IMprove (improve alignment of two molecules)

Improve an existing operator between two molecules. As in 2.10, you must supply two molecule names and two ranges (usually, 'A*' for both molecules). This option is discussed in more detail below. Note that ONLY the first atom type is used now, since we want to compare the two structures on a residue-by-residue basis !

From version 3.0 onwards, the RMS difference between and the linear correlation coefficient of the temperature factors (Bs) of the matched atoms are also shown.

Version 6.1 made a number of changes (see version history).

From version 8.4.1, the IMprove and GLobal commands print a load of statistics about the distribution of the distances between the matched atoms. See also: ACW May, Proteins 37, 20-29 (1999). For formulas etc., see for instance HyperStat Online.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > improve m3 * m4 *
 Improve fit of  M3 *
 And             M4 *
 Atom type      | CA |
 Nr of atoms in mol1 : (        137)
 Nr of atoms in mol2 : (        174)
   
 Found fragment of length : (      12)
 Found fragment of length : (      10)
 Found fragment of length : (      10)
 Found fragment of length : (       6)
 Found fragment of length : (      10)
 Found fragment of length : (      10)
   
 Cycle : (          1)
 Distance cut-off (A)      : (   3.500)
 Min fragment length (res) : (       5)
 The     58 atoms have an RMS distance of    1.634 A
 SI = RMS * Nmin / Nmatch             =      3.85847
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.16234
 RMS delta B for matched atoms        =     7.912 A2
 Corr. coefficient matched atom Bs    =        0.373
 Rotation     :   0.99988919 -0.01435977  0.00391792
                  0.01433638  0.99987960  0.00593596
                 -0.00400269 -0.00587913  0.99997473
 Translation  :      -0.2265      0.5644     -0.2414
   
 [...]
   
 Fit did not improve in this cycle !
 Alignment based on previous operator !
   
 Fragment ASN-A   2 <===> ARG-A  19 @     1.23 A
          PHE-A   3 <===> PHE-A  20 @     1.00 A *
          SER-A   4 <===> SER-A  21 @     0.65 A *
          GLY-A   5 <===> GLY-A  22 @     1.39 A *
   
 [...]
   
          VAL-A 135 <===> SER-A 138 @     1.14 A
          ARG-A 136 <===> ARG-A 139 @     1.07 A *
          GLU-A 137 <===> ASP-A 140 @     2.97 A
   
 Nr of residues in mol1   : (     137)
 Nr of residues in mol2   : (     174)
 Nr of matched residues   : (      58)
 Nr of identical residues : (       7)
 % identical of matched   : (  12.069)
 % matched   of mol1      : (  42.336)
 % identical of mol1      : (   5.109)
 D-value    for mol1      : (   0.022)
 % matched   of mol2      : (  33.333)
 % identical of mol2      : (   4.023)
 D-value    for mol2      : (   0.013)
   
 Analysis of distance distribution:
 Number of distances                    :         58
 Average (A)                            :       1.46
 Standard deviation (A)                 :       0.73
 Variance (A**2)                        :       0.53
 Minimum (A)                            :       0.16
 Maximum (A)                            :       3.35
 Range (A)                              :       3.18
 Sum (A)                                :      84.78
 Root-mean-square (A)                   :       1.63
 Harmonic average (A)                   :       0.94
 Median (A)                             :       1.44
 25th Percentile (A)                    :       0.89
 75th Percentile (A)                    :       1.89
 Semi-interquartile range (A)           :       1.00
 Trimean (A)                            :       1.41
 50% Trimmed mean (A)                   :       1.39
 10th Percentile (A)                    :       0.61
 90th Percentile (A)                    :       2.39
 20% Trimmed mean (A)                   :       1.40
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14.7 DP_improve (dynamic-programming-based operator improvement)

This command offers an alternative way to improve an operator, using a (squared) distance matrix and dynamic programming. This method was pioneered by Cohen (Y Satow, GH Cohen, EA Padlan, DR Davies, J Mol Biol 190, 593-604 (1986), and GH Cohen, J Appl Cryst 30, 1160-1161 (1997)).

You supply the names and chain IDs of two molecules for which you already have a crude operator. You must also supply a distance cut-off (e.g., 3.5 A). The program calculates a matrix containing the (squared) distances between all pairs of central atoms from the two molecules, and uses a standard Needleman-Wunsch algorithm to find the optimal global alignment (i.e., residue-pairing). The paired residues are used to calculate a new operator. If you like, you can repeat this command a number of times.

If you use mode S, the matrix will contain the squares of the distances (actually, times -1.0), and the gap penalty will be set to 0.5 time the square of the cut-off distance. If you use mode D, the normal distances (times -1.0) will be used, and the gap penalty will be half the cut-off. If the final parameter is "Y" (for "Yes"), the program will print the distances between matched residues and the structure-based sequence alignment.

Compared to the regular IMprove command, you tend to find a higher number of matched residue pairs. This is mainly due to the fact that the DP_improve command does not impose a minimum fragment length (i.e., you will get isolated residues matching residues in the other molecule), but also because the DP_improve command gives an alignment that is globally optimal.

From version 9.0, an extra parameter (before the 'verbose' parameter) gives the maximum number of iterations to execute for this command (default is 10; iteration stops when the maximum number of cycles has been carried out, or when both the nr of aligned residues and their RMSD is constant in two successive cycles).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > read m1 pdb1cbs.ent
[...]
 LSQMAN > read m2 pdb1rbp.ent
[...]
 LSQMAN > fast m1 a m2 a 30 15 100
[...]
 Regenerating best alignment ...
 The     66 atoms have an RMS distance of    1.737 A
[...]
 LSQMAN > dp m1 a m2 a sq 3.5
 Max nr of cycles ? (          10)
 Dynamic-Programming-based operator improvement (Needleman-Wunsch)
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 Cut-off distance     3.50
 Matrix mode      SQ
 Max nr of cycles       10
 Verbose output   NO
 Central atoms mol 1 : (        137)
 Central atoms mol 2 : (        174)
   
 DP_improve iteration : (          1)
 [...]
 DP_improve iteration : (          4)
 Calculating squared distance matrix ...
   
 Executing Needleman-Wunsch ...
   
 Gap penalty         : (   6.125)
 Raw alignment score : ( -1.189E+03)
 Length sequence 1   : (     137)
 Length sequence 2   : (     174)
 Alignment length    : (     235)
 Nr of identities    : (       7)
 Perc identities     : (   5.109)
 Nr of matched res   : (      76)
 RMSD for those (A)  : (   1.681)
   
 The     76 atoms have an RMS distance of    1.681 A
 SI = RMS * Nmin / Nmatch             =      3.02961
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.20815
 CR = Maiorov-Crippen RHO (0-2)       =      0.14077
 Estimated RMSD for 2 random proteins =     11.933 A
 RR = Relative RMSD                   =      0.14084
 Rotation     :   0.54386294  0.82775360 -0.13797504
                  0.83284891 -0.55256689 -0.03213305
                 -0.10283868 -0.09743638 -0.98991430
 Translation  :      -8.6507     13.7936     78.0548
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

14.8 GLobal_nw (global-superposition-distance-based Needleman-Wunsch sequence alignment)

In a sense this is the "opposite" of the NWunsch command. Whereas the NWunsch command generates a sequence alignment to superimpose two structures, the GLobal_nw command uses the current structural superimposition to generate a sequence alignment.

This uses a standard Needleman-Wunsch alignment algorithm. However, the substitution matrix S(i,j) contains the squares of the distances between the CA atoms of residues i and j (after applying the current superimposition operator).

You are to supply a distance cut-off. This means that residues that are more than this distance apart will never be matched in the sequence alignment. This is done simply by setting the gap penalty to 0.5*cutoff*cutoff (i.e., if two residues are further apart than the cut-off distance, it is "cheaper" to introduce two gaps than to match them to eachother in the alignment).

As of version 8.2, this command also calculates the significance of the structural alignment, using the method of Levitt-Gerstein (M Levitt & M Gerstein, PNAS 95, 5913-5920 (1998)). The formulas used are on page 5917 and 5918 (note: the values quoted for "a" and "b" are wrong and should be: a = 171.7 and b = -419.2). The number of matched residues and their similarity score are converted into a Z-score and P (z > Z).

This calculation of the Levitt-Gerstein statistics make that this command is excellently suited to compare different alignments.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Levitt-Gerstein statistics:
 Nr of gaps       :           12
 Similarity score :   2.1323E+03
 Z-score          :   2.6627E+01
 P (z > Z)        :   0.0000E+00
 P (z > Z) is the probability of matching any two
 random structures and finding a Z-score z which
 is greater than the Z-score Z of the current pair.
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

From version 8.4.1, statistics about the distribution of distances is printed as well. See the IMprove command for more information.

From version 9.2.1, there is an optional parameter which can be the name of a log file to which you want to save the structure-based sequence alignment and some of the key statistics. From version 9.7.3, the detailed structural alignment will no longer be printed to the screen when a log file is used (to prevent clutter with long alignments).

From version 9.6.2 onwards, this command also prints the value of TM-score as defined by Zhang and Skolnick, Nucl Acids Res 33, 2302-2309 (2005). It is unclear if Ltarget should be the length of a full protein or of a domain, and if it should be equal to L1 (length of protein or domain nr 1) or L2 or min(L1,L2) or (L1+L2)/2. This command uses Ltarget = min(L1,L2), but you can easily calculate the corresponding value for any of the other definition by multiplying the TM-score that is printed by the value of Ltarget (which is also printed) and then dividing by the value you wish to use yourself (e.g., L1, or L2, or (L1+L2)/2, or any other suitable number).

From version 9.7.1 onwards, this command also properly translates DNA and RNA three-letter codes into one-letter codes, so you can make sense of the printed alignment. To prevent confusion with proteins, the one-letter code is in lower case, though. For example:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 [...]
    227 [B 163] g   -     -    ... DIST =        -
    228 [B 164] g   -     -    ... DIST =        -
    229 [B 165] u | u  [A 180] ... DIST =     1.66 A
    230 [B 166] g . a  [A 181] ... DIST =     1.37 A
    231 [B 167] a . g  [A 182] ... DIST =     0.86 A
    232 [B 168] a . u  [A 183] ... DIST =     0.93 A
    233 [B 169] g | g  [A 184] ... DIST =     0.74 A
    234 [B 170] g . a  [A 185] ... DIST =     2.39 A
 [...]
 Sequence 1 gccgugugccuugcg---ccggga--aaccacgca----agggauggugucaaauucggc
 .=ALI |=ID      |.|||||...        .   |.......        |.....||.|||||.|.
 Sequence 2 -----gagccuuuauaca-----gua-auguauaucgaa----aaauccucuaauucagg
   
 Sequence 1 gaaacc----uaagcgcccgcccgggcguaug-gcaacgccgagccaagcuucgca---g
 .=ALI |=ID |||..|                         |  |||..|.||||.|||||.....
 Sequence 2 gaacaccuaa---------------------gg-caauccugagcuaagcucuuaguaa-
 [...]
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Example: if you superimpose 1RBP on top of 1CBS using the OMAC macro "align.lsqmac", the complete sequence alignment is as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Command > (global gmod1 $chn1 gmod2 $chn2 $cutdist $logfile)
 Global-superposition-distance-based Needleman-Wunsch alignment
 Of                   GMOD1 _
 And                  GMOD2 _
 Atom type           | CA |
 Cut-off distance         3.50
 Log file (optional)  q.log
   
 Central atoms mol 1 : (        137)
 Central atoms mol 2 : (        174)
   
 Applying current operator to mol 2 : (   0.541    0.829   -0.138    0.835
    -0.550   -0.029   -0.100   -0.099   -0.990   -8.758   13.729   77.958)
   
 Calculating superposition-distance matrix ...
   
 Executing Needleman-Wunsch ...
   
      1    -    -   E  [_   1] ... DIST =        -
      2    -    -   R  [_   2] ... DIST =        -
      3    -    -   D  [_   3] ... DIST =        -
      4    -    -   C  [_   4] ... DIST =        -
      5    -    -   R  [_   5] ... DIST =        -
      6    -    -   V  [_   6] ... DIST =        -
      7    -    -   S  [_   7] ... DIST =        -
      8    -    -   S  [_   8] ... DIST =        -
      9    -    -   F  [_   9] ... DIST =        -
     10    -    -   R  [_  10] ... DIST =        -
   
 [...]
   
    235    -    -   Y  [_ 173] ... DIST =        -
    236    -    -   C  [_ 174] ... DIST =        -
   
 Sequence 1 ---------------P--NFSGNWKIIRSENFEEL-L---KVLGVNVMLRKIAVAAASKP
 .=ALI |=ID                .  .|||.|...... .  . .                   ....
 Sequence 2 ERDCRVSSFRVKENFDKARFSGTWYAMAKK-D--PEGLFL----------------QDNI
   
 Sequence 1 AVEIKQ---EGDTFYIKTST----------TVRTTEI--NFKVGEEFEEQTVDGRPCKSL
 .=ALI |=ID ..|...     .........          . .....
 Sequence 2 VAEFSVDET--GQMSATAKGRVRLLNNWDVC-ADMVGTF---------------------
   
 Sequence 1 V-K-WE-SENKMVC--------------EQKLLKGEGPKTSWTRE-LT-NDG---ELILT
 .=ALI |=ID . .  .    ....                          .....  .   .   .....
 Sequence 2 TDTE-DP---AKFKMKYWGVASFLQKGN------------DDHWIV-DT--DYDTYAVQY
   
 Sequence 1 MTAD-------DVVCTRVYVRE----------------------------------
 .=ALI |=ID ....       ......|..|.
 Sequence 2 SCRLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYC
   
 Analysis of distance distribution:
 Number of distances                    :         75
 Average (A)                            :       1.49
 Standard deviation (A)                 :       0.68
 Variance (A**2)                        :       0.47
 Minimum (A)                            :       0.19
 Maximum (A)                            :       3.24
 Range (A)                              :       3.05
 Sum (A)                                :     112.07
 Root-mean-square (A)                   :       1.64
 Harmonic average (A)                   :       1.04
 Median (A)                             :       1.44
 25th Percentile (A)                    :       1.02
 75th Percentile (A)                    :       1.91
 Semi-interquartile range (A)           :       0.89
 Trimean (A)                            :       1.46
 50% Trimmed mean (A)                   :       1.47
 10th Percentile (A)                    :       0.54
 90th Percentile (A)                    :       2.46
 20% Trimmed mean (A)                   :       1.46
   
 Gap penalty            :        6.125
 Raw alignment score    :  -1.1886E+03
 L1 = Length sequence 1 :          137
 L2 = Length sequence 2 :          174
 Alignment length       :          236
 NI = Nr of identities  :            7
 L3 = Nr of matched res :           75
 RMSD for those (A)     :        1.643
 ID = NI/min(L1,L2) (%) :         5.11
 ID = NI/L3 (%)         :         9.33
   
 Levitt-Gerstein statistics:
 Nr of gaps       :           17
 Similarity score :   1.1918E+03
 Z-score          :   1.5768E+01
 P (z > Z)        :   1.4185E-07
 P (z > Z) is the probability of matching any two
 random structures and finding a Z-score z which
 is greater than the Z-score Z of the current pair.
   
 TM-score statistics:
 Ltarget          :          137
 d0 (Ltarget) (A) :        4.340
 TM-score         :        0.483
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Doing the same thing with the "align_long.lsqmac" macro gives:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Sequence 1 ---------------P--NFSGNWKIIRSENFEELLKVLGVNVMLRKIAVAAASKPAVEI
 .=ALI |=ID                .  .|||.|...... .  ..    .    .  .  .......|.
 Sequence 2 ERDCRVSSFRVKENFDKARFSGTWYAMAKK-D--PE----G----L--F--LQDNIVAEF
   
 Sequence 1 KQE-GDTFYIKTSTT--------V-RTTEINFKVGEEFEEQTVDGRPCKSLVKWESENKM
 .=ALI |=ID ... ...........        | ......| .                 .....  ..
 Sequence 2 SVDETGQMSATAKGRVRLLNNWDVCADMVGTF-T-----------------DTEDP--AK
   
 Sequence 1 VCE------------QKLLKGEGPKTSWTRELTNDG-ELILTMTAD------D-VVCTRV
 .=ALI |=ID ...            .         .......|... .........      . .....|
 Sequence 2 FKMKYWGVASFLQKGN---------DDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFV
   
 Sequence 1 YVRE----------------------------------
 .=ALI |=ID ..|.
 Sequence 2 FSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYC
   
 Analysis of distance distribution:
 Number of distances                    :         93
 Average (A)                            :       2.18
 Standard deviation (A)                 :       1.40
 Variance (A**2)                        :       1.95
 Minimum (A)                            :       0.09
 Maximum (A)                            :       7.24
 Range (A)                              :       7.15
 Sum (A)                                :     202.71
 Root-mean-square (A)                   :       2.59
 Harmonic average (A)                   :       1.20
 Median (A)                             :       1.77
 25th Percentile (A)                    :       1.19
 75th Percentile (A)                    :       2.65
 Semi-interquartile range (A)           :       1.46
 Trimean (A)                            :       1.84
 50% Trimmed mean (A)                   :       1.85
 10th Percentile (A)                    :       0.73
 90th Percentile (A)                    :       4.28
 20% Trimmed mean (A)                   :       1.99
   
 Gap penalty            :       32.000
 Raw alignment score    :  -4.6232E+03
 L1 = Length sequence 1 :          137
 L2 = Length sequence 2 :          174
 Alignment length       :          218
 NI = Nr of identities  :           10
 L3 = Nr of matched res :           93
 RMSD for those (A)     :        2.589
 ID = NI/min(L1,L2) (%) :         7.30
 ID = NI/L3 (%)         :        10.75
   
 Levitt-Gerstein statistics:
 Nr of gaps       :           18
 Similarity score :   1.3636E+03
 Z-score          :   1.6864E+01
 P (z > Z)        :   4.7428E-08
 P (z > Z) is the probability of matching any two
 random structures and finding a Z-score z which
 is greater than the Z-score Z of the current pair.
   
 TM-score statistics:
 Ltarget          :          137
 d0 (Ltarget) (A) :        4.340
 TM-score         :        0.539
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

15 MANIPULATING OPERATORS

15.1 EDit_operator (edit operator between two molecules)

Supply the names of the two molecules plus the twelve elements of the operator (note: this uses Alwyn's transpose-matrix formalism). The program checks:
* that all nine elements of the rotation matrix have values in between -1 and +1
* that the determinant of the rotation matrix lies in between 0.9995 and 1.0005
If both checks are successful, the operator is overwritten.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ed m1 m2
 Operator bringing : (M2)
 on top of         : (M1)
 Operator element  1 ? ( -0.9570150)
 Operator element  2 ? (  0.1134560)
 Operator element  3 ? ( -0.2669272)
 Operator element  4 ? (  0.1823791)
 Operator element  5 ? ( -0.4801987)
 Operator element  6 ? ( -0.8579901)
 Operator element  7 ? ( -0.2255222)
 Operator element  8 ? ( -0.8697913)
 Operator element  9 ? (  0.4388654)
 Operator element 10 ? (   13.46905) 13
 Operator element 11 ? (   27.21351) 27
 Operator element 12 ? (   38.58303) 39
 Determinant of rotation matrix =   1.000000
 Rotation angle                 = 177.671234
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

15.2 SHow_operator (show operator between two molecules)

List information about an operator. Note that the last LSQMAN command pertaining to this operator is also shown (to help remind you where the operator came from).
From version 3.0 onwards, the RMS difference between and the linear correlation coefficient of the temperature factors (Bs) of the matched atoms are also shown. In addition, if you compare two chains within one molecule (i.e., the case of non-crystallographic symmetry), the program will also tell you what it thinks of the way in which the NCS was con/restrained by whomever did the refinement of the structure.

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > sh m2 m2 Operator bringing : (M2) on top of : (M2) Last command was : (EX M2 A1-400 M2 B1) The 2324 atoms have an RMS distance of 0.338 A SI = RMS * Nmin / Nmatch = 0.33790 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} = 0.74744 RMS delta B for matched atoms = 9.455 A2 Corr. coefficient matched atom Bs = 0.707 Rotation : 0.27394941 0.93619686 0.22019802 0.93816668 -0.31052074 0.15303643 0.21164827 0.16465820 -0.96337569 Translation : -3.6828 -0.9881 27.3222 Nr of NCS operators : 1 NCSOP 1 = 0.2739494 0.9381667 0.2116483 -3.683 0.9361969 -0.3105207 0.1646582 -0.988 0.2201980 0.1530364 -0.9633757 27.322 Determinant of rotation matrix = 1.000000 Crowther Alpha Beta Gamma 37.88222 164.44548 145.20091 Spherical polars Omega Phi Chi 82.22273 36.34090 179.58316 Direction cosines of rotation axis 0.79811 0.58715 0.13532 Dave Smith -170.30086 77.27934 -73.72193 Rotation angle = 179.583160

*POOR* - NCS not restrained *POOR* - NCS Bs not restrained ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

15.3 SAve_operator (write operator to O datablock file)

Save an operator in an O datablock file. Provide the names of the two molecules, a filename and, optionally, a datablock name.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > save m1 m2
 Operator bringing : (M2)
 on top of         : (M1)
 File name ? (rt_m2_to_m1.odb)
 Save in file   : (rt_m2_to_m1.odb)
 Datablock name : (.lsq_rt_m2_to_m1)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

15.4 OLd_o_operator (read operator from O datablock file)

Read an operator from an O datablock file. The same checks as listed for the EDit_operator command are applied here.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ol m1 m2
 Operator bringing : (M2)
 on top of         : (M1)
 File name ? (rt_m2_to_m1.odb)
 Opening O datablock : (rt_m2_to_m1.odb)
 Created by LSQMAN V. 931022/0.4 at Fri Oct 22 23:03:13 1993 for user gerard
 Datablock : (.LSQ_RT_M2_TO_M1)
 Data type : (R)
 Number    : (12)
 Format    : ((3F15.7))
 Operator : ( -9.570E-01   1.135E-01  -2.669E-01   1.824E-01  -4.802E-01
  -8.580E-01  -2.255E-01  -8.698E-01   4.389E-01   1.347E+01   2.721E+01
  3.858E+01)
 Determinant of rotation matrix =   1.000000
 Rotation angle                 = 177.671234
 Operator looks okay !
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

15.5 PErturb_operator (perturb an operator)

Perturb an operator, e.g. to investigate how "stable" an operator is w.r.t. small random changes. Provide the names of the two molecules and an amplitude. Three random rotation angles (random magnitude between minus and plus amplitude) will be generated for rotations around X, Y and Z; the existing operator will be multiplied with the perturbing operator. Use the RMsd_calc command to see the effect. Use the IMprove command to see if the operator converges back to its previous values.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > pe m1 m2 3
 Command > (pe m1 m2 3)
 Perturb operator ...
 Molecule 1 : (M1)
 Molecule 2 : (M2)
 Amplitude  : (   3.000)
 => Random number generator initialised with seed :        103
 Rotations around X,Y,Z (deg) :    2.57   -1.35    1.20
 LSQMAN > rm m1 a1-999 m2 a1
 Command > (rm m1 a1-999 m2 a1)
 Calculate RMSD of M1 A1-999
 And               M2 A1
 Atom types       | CA |
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        459)
 Nr skipped (B limits) : (          0)
   
 The    459 atoms have an RMS distance of    2.830 A
 RMS delta B  =    0.000 A2
 Corr. coeff. =      1.0000
 Rotation    :   0.999505 -0.019873 -0.024403
                 0.020941  0.998799  0.044305
                 0.023493 -0.044794  0.998720
 Translation :      0.000     0.000     0.000
   
 Vectors between first and last selected atoms:
 Mol 1  : (  -7.326   12.930    4.243)
 Mol 2" : (  -6.952   12.870    4.989)
 Angle between them (deg) : (   3.103)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

15.6 APply_operator (apply an operator to the coordinates of a molecule)

Apply an operator. The FIRST molecule is the one TO WHICH you want to align the SECOND molecule, using the current operator between them. In other words: the transformation is applied to the SECOND molecule !!!

Optional parameters:
- chain id (e.g., A, B, ..., Z, or * to denote all chains)
- first residue (e.g., 1, 163, ...)
- last residue (e.g., 99, 1000, ...)
By default, the operator is applied to all residues of all chains.

If the entire second molecule is moved, then all its operators to all other molecules will be reset (to the identity operator).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ap m1 m1 b
 Bring Mol 2 on top of Mol 1 ...
 Warning: mol1 == mol2 !
 Molecule 1 : (M1)
 Molecule 2 : (M1)
 Apply to mol 2 chain : (B)
 Nr of atoms moved : (       3518)
 Resetting ALL operators of mol 2 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > app m1 m2 a 15 25
 Command > (app m1 m2 a 15 25)
 Bring Mol 2 on top of Mol 1 ...
 Molecule 1 : (M1)
 Molecule 2 : (M2)
 Apply to mol 2 chain : (A)
 First res : (      15)
 Last  res : (      25)
 Nr of atoms moved : (         69)
 Resetting ALL operators of mol 2 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

16 MISCELLANEOUS COMMANDS

16.1 GEt XYz (select residues near a point in space)

If you want to compare two different copies of the same molecule, e.g. an apo and a complex structure, you may just be interested in superimposing the residues in the binding site. This command makes it easier to do that.

You supply the name of a molecule and the chain ID that you are interested in. You also supply a triple of (X,Y,Z) coordinates, and a radius. Finally, you supply the name of a symbol (as they are used by the & commands and $ mechanism). The program will now loop over all residues in the selected chain, and if any atom of a residue is closer than "radius" from the point (X,Y,Z), the name of the residue (chain ID plus residue number) will be written to the symbol whose name you provided. Afterwards, you can use the symbol as a zone definition in the EXplicit (or RMsd) command, e.g.: "expl m2 $chlor3 m1 $chlor3".

Limitations: the value of the symbol can not be more than 256 characters and the number of residue names can not be greater than 50.

Please also note that symbol values are accessed by supplying $SYMBOL_NAME as an argument on the command line. The line that you type on the terminal (or in a macro) is parsed only once. If there are additional parameters which the program prompts you for, you cannot use symbols for those.

Note: as version 9.0.3 there is an extra (optional) parameter, namely the name of an O macro. If provided, this macro will contain O instructions to draw a 'zone' object of the selected residues.

Note: the selected residues cannot be written to a PDB file automatically. If you want to do this, use the SElect commands in MOLEMAN2 instead as these provide much more functionality.

Example: read the structures of PDB entries 1CHR and 2CHR (without HETATMs):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
hetatm strip
read m1 1chr.pdb
read m2 2chr.pdb
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Collect the names of all residues that have an atom within 3.5 Å of the manganese ion (at 28.850 50.835 90.295) in 2CHR, and store the names in a symbol called "manganese":

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > get xyz m2 a 28.850 50.835 90.295 3.5 mn mn.omac
 X, Y, Z : (  28.850   50.835   90.295)
 Radius (A) : (   3.500)
 Creating O macro : (mn.omac)
 Object name in O : (MN_A)
 #     1 @   3.27 A ->   CG  ASP A 194      29.827  52.252  93.077  1.00 18.37      2CHR
 #     2 @   3.07 A ->   CD  GLU A 220      31.327  49.245  91.166  1.00 18.18      2CHR
 #     3 @   3.17 A ->   CG  ASP A 245      29.546  49.563  87.476  1.00 23.09      2CHR
 Selection : ( A194 A220 A245)
 Symbol MN : ( A194 A220 A245)
 O macro written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The O macro looks as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > $ cat mn.omac
 Spawn system command : (  cat mn.omac)
! LSQMAN-generated O macro mn.omac
! Command : GET XYZ M2 A 28.850 50.835 90.295 3.5 mn mn.omac
!
! molec #Molecule ?#
object MN_A
zone  A194
zone  A220
zone  A245
end
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Do the same for the residues within 4.0 Å from the chloride ion:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > get xyz m2 a 26.076 49.901 90.673 4.0 cl cl.omac
 X, Y, Z : (  26.076   49.901   90.673)
 Radius (A) : (   4.000)
 Creating O macro : (cl.omac)
 Object name in O : (CL_A)
 #     1 @   3.82 A ->   NE1 TRP A  55      25.382  52.242  87.734  1.00 27.30      2CHR
 #     2 @   3.88 A ->   NZ  LYS A 163      28.018  48.111  93.522  1.00 31.06      2CHR
 #     3 @   3.44 A ->   NZ  LYS A 165      25.748  51.719  93.578  1.00 20.53      2CHR
 #     4 @   3.53 A ->   OD2 ASP A 245      28.872  49.712  88.521  1.00 23.09      2CHR
 #     5 @   3.38 A ->   NZ  LYS A 269      25.908  48.274  87.715  1.00  2.00      2CHR
 Selection : ( A55 A163 A165 A245 A269)
 Symbol CL : ( A55 A163 A165 A245 A269)
 O macro written
 LSQMAN > & ?
 Nr of defined symbols : (       4)
 Symbol START_TIME : (Sat Feb  2 00:44:47 2002)
 Symbol USERNAME : (gerard)
 Symbol MN : ( A194 A220 A245)
 Symbol CL : ( A55 A163 A165 A245 A269)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Superimpose 1CHR onto 2CHR using the coordinates of all the side-chain atoms of the residues near the manganese ion:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > atom_type side Nr of atom types : ( 1) Types : ( SIDE) LSQMAN > expl m2 $mn m1 $mn Explicit fit of M2 A194 A220 A245 And M1 A194 A220 A245 Atom types |SIDE| B-factor range used: -1000.00 - 10000.00 A2 Nr of atoms to match : ( 13) Nr skipped (B limits) : ( 0)

The 13 atoms have an RMS distance of 0.898 A RMS delta B = 2.453 A2 Corr. coeff. = 0.5859 Rotation : 0.998520 0.031156 -0.044583 -0.028606 0.997979 0.056735 0.046261 -0.055376 0.997393 Translation : -2.846 4.029 -1.188 Maiorov-Crippen RHO (0-2) = 0.31138 Estimated RMSD for 2 random proteins = 1.729 A Relative RMSD = 0.51918 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Now calculate the RMSD of all atoms of the residues near the chloride ion (using the "manganese-based" operator):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > atom_type nonh
 Nr of atom types : (       1)
 Types : ( NONH)
 LSQMAN > rmsd m2 $cl m1 $cl
 Calculate RMSD of M2  A55 A163 A165 A245 A269
 And               M1  A55 A163 A165 A245 A269
 Atom types       |NONH|
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (         49)
 Nr skipped (B limits) : (          0)
   
 The     49 atoms have an RMS distance of    0.709 A
 RMS delta B  =    8.755 A2
 Corr. coeff. =      0.7082
 Rotation    :   0.998520  0.031156 -0.044583
                -0.028606  0.997979  0.056735
                 0.046261 -0.055376  0.997393
 Translation :     -2.846     4.029    -1.188
 Maiorov-Crippen RHO (0-2)            =      0.09694
 Estimated RMSD for 2 random proteins =      6.615 A
 Relative RMSD                        =      0.10720
   
 Vectors between first and last selected atoms:
 Mol 1  : (   4.206   -7.821    2.334)
 Mol 2" : (   4.610   -8.010    2.496)
 Angle between them (deg) : (   1.640)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

16.2 RMsd_calc (calculate RMSD for subset of atoms with current operator)

With this command you can calculate the RMSD of any set of atoms using the *current* operator. This enables you, for instance, to improve an operator using CA atoms, and then to calculate the RMSD of all backbone atoms using the improved operator. The statistics are printed, but *not* stored (i.e., the numbers listed with the SHow command are not altered by this operation).

From version 3.2.2 onwards, a B-factor limit may be imposed with the BF command.

From version 4.3.1 onwards, the angle between the vectors spanned between the first and last selected atoms will be calculated. You can use this to find the angles between aligned helices and strands in different molecules. Note that it doesn't make any assumptions about things being regular helix or strand or anything, so that it will not try to fit a helical axis, for example; nevertheless, the answers are close enough for government work and the option will also work for non-proteins).

From version 9.5.1 onwards, this command also prints some statistics pertaining to the difference-distance matrix.

From version 9.6.1 onwards, this command also prints the value of TM-score as defined by Zhang and Skolnick, Nucl Acids Res 33, 2302-2309 (2005). For this to work, you must provide the value of Ltarget. It is unclear if Ltarget should be the length of a full protein or of a domain, and if it should be equal to L1 (length of protein or domain nr 1) or L2 or min(L1,L2) or (L1+L2)/2. The NW_glob command (as of version 9.6.2) also calculates this number and uses Ltarget = min(L1,L2).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m1 a1-999 m1 b1
 WARNING - mol1 == mol2 !
 Explicit fit of M1 A1-999
 And             M1 B1
 Atom types     | CA |
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (        221)
 Nr skipped (B limits) : (          0)
   
 The    221 atoms have an RMS distance of    0.316 A
 [...]
 LSQMAN > atom all
 Nr of atom types : (       1)
 Types : ( ALL)
 LSQMAN > rm m1 a1-999 m1 b1
 WARNING - mol1 == mol2 !
 Calculate RMSD of M1 A1-999
 And               M1 B1
 Atom types       |ALL |
 B-factor range used: -1000.00 - 10000.00 A2
 Nr of atoms to match  : (       1797)
 Nr skipped (B limits) : (          0)
   
 The   1797 atoms have an RMS distance of    0.773 A
 RMS delta B  =    4.732 A2
 Corr. coeff. =      0.8411
 Rotation    :  -0.631871  0.081430  0.770784
                 0.062239 -0.985924  0.155181
                 0.772570  0.146027  0.617909
 Translation :    109.372    32.440   -55.023
 Maiorov-Crippen RHO (0-2)            =      0.04351
 Estimated RMSD for 2 random proteins =     19.151 A
 Relative RMSD                        =      0.04039
 Normalised RMSD (100)                =      0.316 A
 RMSD / Nalign                        =    0.00043 A
   
 Nr of unique elements in DDM   : (    1613706)
 Max absolute DDM element (A)   : (   8.782)
 RMS of unique DDM elements (A) : (   0.574)
   
 Vectors between first and last selected atoms:
 Mol 1  : (  11.058   -3.139   37.807)
 Mol 2" : (   9.212   -2.242   38.661)
 Angle between them (deg) : (   3.184)
 CPU total/user/sys :       1.7       1.7       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > rmsd m1 a26-36 m2 a26 Calculate RMSD of M1 A26-36 And M2 A26 Atom types | CA | B-factor range used: -1000.00 - 10000.00 A2 Nr of atoms to match : ( 11) Nr skipped (B limits) : ( 0) The 11 atoms have an RMS distance of 0.333 A RMS delta B = 21.250 A2 Corr. coeff. = 0.1809 Rotation : -0.779411 -0.366070 -0.508440 0.275059 -0.929083 0.247278 -0.562905 0.052881 0.824829 Translation : 19.155 34.722 27.774

Vectors between first and last selected atoms: Mol 1 : ( 4.498 -1.028 14.570) Mol 2" : ( 4.050 -0.701 14.022) Angle between them (deg) : ( 1.524) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

16.3 SOap (visualise structural differences)

This command takes two aligned chains and will generate the set of triangles (in an ODL file that can be drawn in O) that has the minimal surface area (using dynamic programming). This is based on an idea of A Falicov and FE Cohen, JMB 258, 871-892 (1996), who in turn were inspired by papers of GE Schulz. Note that LSQMAN does not use the minimum area as a criterion to optimise the superpositioning operator - it is merely a visualisation tool !

From version 9.0, an optional 'verbose' parameter can be used to control the amount of output (default is 'no').

Example of a SOap film in O for two fairly similar structures.

Example of a SOap film in O for two fairly different structures.

(Note: given a certain operator, the order in which you specify the two chains is irrelevant - the results will be identical.)

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > soap m1 a m2 a soap_film.odl
 Soap-film ODL file generation
 Of               M1 A
 And              M2 A
 Atom type       | CA |
 ODL file name    soap_film.odl
 Central atoms mol 1 : (        370)
 Central atoms mol 2 : (        370)
   
 Applying current operator to mol 2 : (   1.000    0.000    0.000    0.000
     1.000    0.000    0.000    0.000    1.000    0.000    0.000    0.000)
   
 Calculating triangle-area matrix ...
   
 Executing Needleman-Wunsch ...
 Type 2 -> I,J-1,J =    370   369   370 Area =       6.88 A2
 Type 1 -> I-1,I,J =    369   370   369 Area =       2.96 A2
 Type 1 -> I-1,I,J =    368   369   369 Area =       2.88 A2
 Type 2 -> I,J-1,J =    368   368   369 Area =       2.75 A2
 Type 1 -> I-1,I,J =    367   368   368 Area =       2.46 A2
 Type 2 -> I,J-1,J =    367   367   368 Area =       0.36 A2
   
 [...]
   
 Type 2 -> I,J-1,J =      3     3     4 Area =       0.58 A2
 Type 2 -> I,J-1,J =      3     2     3 Area =       0.58 A2
 Type 1 -> I-1,I,J =      2     3     2 Area =       0.91 A2
 Type 1 -> I-1,I,J =      1     2     2 Area =       0.99 A2
 Type 2 -> I,J-1,J =      1     1     2 Area =       2.08 A2
 ODL file written
 Total area (A2) : (  5.290E+02)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

By the way - the surface area of the individual triangles is calculated using Hero's formula. Given that the three sides of a triangle have lengths a, b, and c, and let s=(a+b+c)/2, then the surface area A of the triangle is equal to: A = SQRT (s * (s-a) * (s-b) * (s-c))

16.4 JUdge (judge homology model by comparing with target and parent)

To be documented later !

16.5 CAsp (assess sequence-identical residues of model and target)

To be documented later !

17 PLOTS

17.1 PHipsi (make delta-Phi, delta-Psi plot)

With this command you can compare the PHI/PSI angles of corresponding zones of residues in different, or NCS-related, molecules. The output is a plot file for O2D, plus the RMSD values for the two torsion angles. Note that this command does *not* require the two molecules to be superimposed, since the angles are independent of orientation.
For a further discussion of this type of plot, see:
AP Korn & DR Rose, Prot. Engineering 7(8), 961-967 (1994)

NOTE: from version 3.2.1 onward, commented-out O datablock headers are included in the plot files. This enables you to quickly convert the plot files into O datablocks, read them into O, and use them to colour your molecule. (See the DIstance command for details on how to do this.)

Example of a PHipsi plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ph m1 a5-190 m1 b5 3sdp_phipsi.plt
 WARNING - mol1 == mol2 !
 Plot of M1 A5-190
 And     M1 B5
 Nr of residues matched : (        186)
 RMS delta PHI       : (  69.041)
 Average |delta PHI| : (  49.626)
 Nr |delta PHI| > 10 : (     150)
 Percentage          : (  80.645)
 RMS delta PSI       : (  71.161)
 Average |delta PSI| : (  51.375)
 Nr |delta PSI| > 10 : (     148)
 Percentage          : (  79.570)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The plot file may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by LSQMAN V. 941218/3.0.1 at Sun Dec 18 00:50:42 1994 for user gerard
XLABEL Residue number
YLABEL Delta PHI and Delta PSI
REMARK Plot of delta PHI (solid blue) and delta PSI (dotted red) as a function of residue nr
REMARK Values are those of mol M1 = file /nfs/pdb/full/3sdp.pdb
REMARK      minus those of mol M1 = file /nfs/pdb/full/3sdp.pdb
XYVIEW        4     191    -180     180
NPOINT      186
COLOUR        4
XVALUE *
     5     6     7     8     9    10    11    12    13    14    15    16
    17    18    19    20    21    22    23    24    25    26    27    28
    29    30    31    32    33    34    35    36    37    38    39    40
...
   173   174   175   176   177   178   179   180   181   182   183   184
   185   186   187   188   189   190
! MOLNAM_RESIDUE_DPHI R 186 (9f8.2)
YVALUE *
    0.00   -3.92    5.34  -59.37  139.44  155.28   92.02   -2.66  145.04
   87.80  149.14  -83.57   16.04   48.16   16.36   -0.47  -27.75  -53.48
...
   -9.94  -17.02  -20.24   68.48   27.43  138.36
REMARK RMS delta PHI =    69.04 +++ Average |delta PHI| =    49.63
REMARK Nr |delta PHI| > 10 =      150 +++ Percentage =    80.65
MORE
COLOUR        1
! MOLNAM_RESIDUE_DPSI R 186 (9f8.2)
YVALUE *
   42.14  -44.13  -64.43   39.98    0.63 -172.61 -112.61   35.95  174.50
...
   36.56   -7.45    9.34  -17.56   85.27    0.00
REMARK RMS delta PSI =    71.16 +++ Average |delta PSI| =    51.38
REMARK Nr |delta PSI| > 10 =      148 +++ Percentage =    79.57
END
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.2 ETa_theta_plot (make delta-Eta, delta-Theta plot for nucleic acids)

This command is similar to the PHipsi command, but instead it can be used to analyse differences in the backbone conformation of two related nucleic acid structures (e.g., ribosomal RNA). Instead of the Phi and Psi torsion angles as for proteins, this command uses the pseudo-torsions Eta and Theta as defined by C Duarte and AM Pyle (J Mol Biol 284, 1465-1478 (1998)):

- Eta = torsion ( C4*(i-1), P(i), C4*(i), P(i+1))
- Theta = torsion (P(i), C4*(i), P(i+1), C4*(i+1))

The following applies both for ETa_theta and PHipsi plots:

- The output is a plot file for O2D, plus a bunch of statistics, a list of outliers, and two histograms.
- This command does *not* require the two molecules to be superimposed, since the angles are independent of orientation.
- Commented-out O datablock headers are included in the plot files. This enables you to quickly convert the plot files into O datablocks, read them into O, and use them to colour your molecule. (See the DIstance command for details on how to do this.)

Example of an ETa_theta plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > read m1 1kqs.pdb
 [...]
 LSQMAN > read m2 ../progs/mole2/test/1ffk.pdb
 [...]
 LSQMAN > eta_theta m1 a1-3000 m2 a1 m1_m2_et2.plt
 Command > (eta_theta m1 a1-3000 m2 a1 m1_m2_et2.plt)
 Delta-Eta/Delta-Theta plot
 Plot of M1 A1-3000
 And     M2 A1
 Outlier list cut-off  :    25.00
 Histogram upper limit :   180.00
 Histogram bin size    :    10.00
 Nr of residues found : (       2698)
   
 >>> Delta-ETA   U A 200 -   U A 200 =    41.32
 >>> Delta-ETA   C A 284 -   C A 284 =    89.80
 >>> Delta-ETA   U A 713 -   U A 713 =    26.94
 >>> Delta-ETA   G A 817 -   G A 817 =   -29.86
 >>> Delta-ETA   G A 953 -   G A 953 =    25.87
 >>> Delta-ETA   U A1524 -   U A1524 =    40.34
 >>> Delta-ETA   A A1693 -   A A1693 =   -30.79
 >>> Delta-ETA   G A2283 -   G A2283 =   -53.03
 >>> Delta-ETA   C A2636 -   C A2636 =   -45.50
 >>> Delta-ETA   A A2637 -   A A2637 =  -162.89
 >>> Delta-ETA   A A2813 -   A A2813 =    30.46
 RMS delta ETA       : (   5.937)
 Average |delta ETA| : (   2.982)
 Nr |delta ETA| > 25 : (      11)
 Percentage          : (   0.408)
 Corr. coeff. ETA    : (   0.999)
   
 >>> Delta-THETA   C A  87 -   C A  87 =    35.87
 >>> Delta-THETA   C A 284 -   C A 284 =   -84.70
 >>> Delta-THETA   U A 713 -   U A 713 =   -36.83
 >>> Delta-THETA   G A 817 -   G A 817 =    36.38
 >>> Delta-THETA   G A 953 -   G A 953 =   -26.33
 >>> Delta-THETA   G A1087 -   G A1087 =   -29.34
 >>> Delta-THETA   G A1354 -   G A1354 =   -51.73
 >>> Delta-THETA   A A1355 -   A A1355 =    40.98
 >>> Delta-THETA   A A1448 -   A A1448 =   -26.18
 >>> Delta-THETA   U A1524 -   U A1524 =   -34.36
 >>> Delta-THETA   U A2242 -   U A2242 =    33.24
 >>> Delta-THETA   U A2282 -   U A2282 =    27.83
 >>> Delta-THETA   G A2283 -   G A2283 =    49.69
 >>> Delta-THETA   C A2636 -   C A2636 =  -160.60
 >>> Delta-THETA   A A2637 -   A A2637 =   -61.00
 >>> Delta-THETA   A A2813 -   A A2813 =   -46.57
 RMS delta THETA     : (   6.073)
 Average |delta THE| : (   3.041)
 Nr |delta THE| > 25 : (      16)
 Percentage          : (   0.593)
 Corr. coeff. THETA  : (   0.999)
 Plot file written
   
 Histogram of |delta ETA| values :
     2583 in [  0.00- 10.00> =  95.74% (Cumul  95.74 %)
       86 in [ 10.00- 20.00> =   3.19% (Cumul  98.93 %)
       21 in [ 20.00- 30.00> =   0.78% (Cumul  99.70 %)
        2 in [ 30.00- 40.00> =   0.07% (Cumul  99.78 %)
        3 in [ 40.00- 50.00> =   0.11% (Cumul  99.89 %)
        1 in [ 50.00- 60.00> =   0.04% (Cumul  99.93 %)
        1 in [ 80.00- 90.00> =   0.04% (Cumul  99.96 %)
        1 in [160.00-170.00> =   0.04% (Cumul 100.00 %)
   
 Histogram of |delta THETA| values :
     2592 in [  0.00- 10.00> =  96.07% (Cumul  96.07 %)
       81 in [ 10.00- 20.00> =   3.00% (Cumul  99.07 %)
       13 in [ 20.00- 30.00> =   0.48% (Cumul  99.56 %)
        5 in [ 30.00- 40.00> =   0.19% (Cumul  99.74 %)
        3 in [ 40.00- 50.00> =   0.11% (Cumul  99.85 %)
        1 in [ 50.00- 60.00> =   0.04% (Cumul  99.89 %)
        1 in [ 60.00- 70.00> =   0.04% (Cumul  99.93 %)
        1 in [ 80.00- 90.00> =   0.04% (Cumul  99.96 %)
        1 in [160.00-170.00> =   0.04% (Cumul 100.00 %)
 CPU total/user/sys :       4.4       4.4       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The plot file may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by LSQMAN V. 050920/9.7 at Tue Sep 20 23:08:52 2005 for gerard
XLABEL Residue number
YLABEL Delta ETA and Delta THETA
REMARK Plot of delta ETA (solid blue) and delta THETA (dotted red) as a function of residue nr
REMARK Values are those of mol M1 = file 1kqs.pdb
REMARK      minus those of mol M2 = file ../progs/mole2/test/1ffk.pdb
XYVIEW       10    2915    -180     180
NPOINT     2698
COLOUR        4
LINE         10       0    2915       0
XVALUE *
    11    12    13    14    15    16    17    18    19    20    21    22
[...]
  2905  2906  2907  2908  2909  2910  2911  2912  2913  2914
! MOLNAM_RESIDUE_DETA R 2698 (9f8.2)
YVALUE *
    4.12    2.76   -2.28   -2.52    1.84    0.49    0.36   -0.53   -0.50
    2.43   -0.88   -1.32    2.45   -5.03    1.01    1.82   -2.62    1.64
[...]
    1.05    6.96   -8.91    5.64   -0.01   -1.22    0.00
REMARK RMS delta ETA =     5.94 +++ Average |delta ETA| =     2.98
REMARK Nr |delta ETA| > 25 =       11 +++ Percentage =     0.41
REMARK Corr. coeff. ETA(1)-ETA(2) =     1.00
MORE
COLOUR        1
! MOLNAM_RESIDUE_DTHETA R 2698 (9f8.2)
YVALUE *
    0.18    4.69   -1.98   -3.45   -0.48    4.35   -2.76    1.02   -1.86
    2.61   -0.63   -2.73   -0.20    3.15    1.53    0.55   -0.43   -1.02
[...]
    3.44   -5.22   -3.13   -3.80    6.62    1.06    0.00
REMARK RMS delta THETA =     6.07 +++ Average |delta THETA| =     3.04
REMARK Nr |delta THETA| > 25 =       16 +++ Percentage =     0.59
REMARK Corr. coeff. THETA(1)-THETA(2) =     1.00
END
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.3 DIst_plot (plot distance between atoms in two molecules)

With this command you can calculate and plot the distances of corresponding atoms in corresponding residues after superimposing two molecules. Note that only one type of atom can be compared at a time (the first type set with one of the ATom_types commands). Usually, these will be CA atoms, but it can also be used for DNA or RNA. Note that the two molecules must have been superimposed before you use this command.

Example of a DIst plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m2 /nfs/pdb/full/5rub.pdb
...
 LSQMAN > at ca
 Nr of atom types : (       1)
 Type : ( CA)
 LSQMAN > ex m2 a2-457 m2 b2
...
 The    432 atoms have an RMS distance of    0.909 A
 RMS delta B  =    5.001 A2
 Corr. coeff. =      0.9451
...
 LSQMAN > di m2 a2-457 m2 b2 5rub_cadist.plt
 WARNING - mol1 == mol2 !
 Central-atom distance plot
 Central atom type : ( CA)
 Plot of M2 A2-457
 And     M2 B2
 Nr of residues matched : (        432)
 Average distance : (   0.631)
 Minimum distance : (   0.028)
 Maximum distance : (   4.912)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

NOTE: from version 3.2.1 onward, commented-out O datablock headers are included in the plot files. This enables you to quickly convert the plot files into O datablocks, read them into O, and use them to colour your molecule. How to do this was explained on the O-info bulletin board:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
    > I have superimposed 2 molecules, and have made a *.plt file in LSQMAN by
    > running DIstance_plot. I would like to colour the CA trace of one of the
    > molecules according to this output. Does anyone know how to do this? The
    > LSQMAN manual says this can be done in O.
	
    izza seemple ! the plot file contains a line which looks
    something like this:
	
    ! MOLNAM_RESIDUE_DIST R 316 (9f8.2)
	
    (1) remove everything before this line
    (2) remove the "! " of the line itself, and change
	  "MOLNAM" to the name of your molecule (as you called
	  it in O !)
    (3) remove the next line (which says "YVALUE *")
    (4) scroll to the bottom and remove the REMARK and END lines
    (5) save the file as dist.odb
	
    you now have an O datablock file; start up O,
    "read dist.odb" and then select your molecule
    and do, for instance, a paint_ramp using the
    property RESIDUE_DIST, then draw the CA trace,
    and once again the ubiquitous Bob is your uncle
	
    --dvd
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.4 QDiff_dist_plot (difference-distance plot for rigid domain identification)

With this command you can plot the 2D matrix of differences between corresponding distances in two different models of the same molecule (e.g., in different conformations). This is particularly useful to detect rigid domains. For example, if you make the plot for the open and closed form of ribose-binding protein (PDB codes 1URP and 2DRI), you will see that RBP consists of two domains, and that each domain of RBP is made up of two segments, i.e. the chain builds most of one domain, then goes over and builds most of the second, then it comes back and finishes domain one, then goes back and finishes domain 2.
This command will work with the first selected atom type (e.g., use CA atoms for proteins).
The plot file can be inspected (or converted into PostScript) with the program O2D.

Example of a QDiff_dist plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > qd m1 a1-999 m2 a1
 Plot file ? (m1_m2_diff_dist.plt)
 Difference-distance plot
 For M1 A1-999
 And M2 A1
 Atom type | CA |
 Nr of atoms found : (        271)
 ERROR --- XSTAT2 - Zeroes ignored for harmonic average
 Nr of zeroes : (        271)
 Nr of distances compared : (      73441)
 Average diff-dist (A): (   1.136)
 St.dev. diff-dist (A): (   2.493)
 Minimum diff-dist (A): (  -6.634)
 Maximum diff-dist (A): (  12.322)
 RMS     diff-dist (A): (   2.739)
 Harm.av diff-dist (A): (  -0.623)
 Plot file written
 CPU total/user/sys :       1.1       1.1       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.5 DDihe (delta-dihedral plots for backbone comparison)

With this command you can calculate delta-dihedral plots of corresponding atoms in corresponding residues. Note that only one type of atom can be compared at a time (the first type set with one of the ATom_types commands). Usually, these will be CA atoms, but it can also be used for DNA or RNA. Note that the two molecules need not be superimposed for this command.
Similar molecules should have similar CA dihedrals (and Phi/Psi angles); large differences mean that the chain trace differs considerably, which in the case of NCS-related molecules is more likely to be the result of low resolution and improper refinement practices than a manifestation of reality.
From version 3.1.1 onward, the |delta(X-X-X) angle| curve is also plotted.

Example of a DDihedral plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/3sdp.pdb
...
 LSQMAN > dd m1 a5-190 m1 b5 3sdp_cadihe.plt
 WARNING - mol1 == mol2 !
 Central-atom delta-dihedral plot
 Central atom type : ( CA)
 Plot of M1 A5-190
 And     M1 B5
 Nr of residues matched : (        183)
 RMS delta DIH       : (  54.345)
 Average |delta DIH| : (  35.837)
 Nr |delta DIH| > 10 : (     120)
 Percentage          : (  65.574)
 RMS |delta ANG|     : (  15.931)
 Average |delta ANG| : (  11.835)
 Nr |delta ANG| > 5  : (     124)
 Percentage          : (  67.760)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.6 DChi (list large side-chain torsion angle differences)

Use this command to find residues in two different models of related molecules (e.g., before and after refinement, or NCS-related, or different mutants or complexes) that display large differences between one or more of their side-chain torsion angles.

For example, for PDB entry 1CEL:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > dc m2 a1-999 m2 b1 20 WARNING - mol1 == mol2 ! Delta-CHI table Compare M2 A1-999 And M2 B1 Outlier list cut-off : 20.00

Residue 1 Residue 2 CHI-1 CHI-2 CHI-3 CHI-4 LYS A 18 - LYS B 18 = -3.74 -1.38 8.62 67.82 * ASP A 63 - ASP B 63 = 2.64 177.25 * GLU A 65 - GLU B 65 = -11.81 -16.96 47.37 * GLN A 101 - GLN B 101 = -26.30 * 0.65 -7.33 PHE A 146 - PHE B 146 = -0.92 178.08 * -0.70 -0.10 TYR A 167 - TYR B 167 = 0.32 -172.27 * -10.20 0.54 GLN A 186 - GLN B 186 = 2.54 -8.26 -83.16 * GLU A 193 - GLU B 193 = 7.68 -7.58 42.64 * GLU A 217 - GLU B 217 = -1.06 2.91 178.28 * TYR A 274 - TYR B 274 = 0.17 -177.19 * -9.89 0.58 PRO A 276 - PRO B 276 = -39.04 * 64.66 * PHE A 280 - PHE B 280 = 1.44 179.83 * -2.92 -0.01 ASN A 315 - ASN B 315 = 4.81 46.73 * TYR A 321 - TYR B 321 = -0.63 -173.46 * -7.06 0.15 ASP A 328 - ASP B 328 = 13.43 -35.87 * ASP A 345 - ASP B 345 = 9.06 20.10 * GLU A 385 - GLU B 385 = 5.29 -15.36 33.13 * ASN A 413 - ASN B 413 = 12.42 -36.31 * Nr of residues : ( 356) Nr of outliers : ( 18) CPU total/user/sys : 2.6 2.6 0.0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Note that the torsion-anmgle differences that exceed the cut-off are marked with an asterisk. Also note that some torsion-angle differences near 180 degrees are artefacts (e.g., chi-2 of Tyr, Phe, and Asp and chi-3 of Glu).

17.7 LEsk_plot (plot RMSD as a function of number of aligned residues)

A single RMSD value tends not to be very informative. Lesk et al. suggested to use a similarity curve instead. This command generates such a curve for you, by doing an IMprove step using a distance cut-off of 3.0, 4.0, 5.0 and finally 6.0 A. This should find a reasonable approximation of the maximal common subset. The number of matched residues and their RMSD are stored. Then the worst-fittign pair of atoms is removed, and a new operator is calculated for the remaining atoms, and the RMSD stored again. This continues until the RMSD drops below 0.2 A or the number of matched residues below 10.

Note: the two chains you compare must already have been superimposed in some way (e.g., using the EXplicit or BRute_force commands) !!

Reference: JA Irving, JC Whisstock & AM Lesk, Proteins: Struct. Funct. Genet., 42, 378-382 (2001).

Example of a LEsk plot.

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > lesk m1 a m2 a m1_m2_lesk.plt Lesk plot of M1 A And M2 A Atom types | CA |

# 0 Cut-off 3.000 Nmatch 103 RMSD 0.561 A # 0 Cut-off 4.000 Nmatch 103 RMSD 0.561 A # 0 Cut-off 5.000 Nmatch 103 RMSD 0.561 A # 0 Cut-off 6.000 Nmatch 103 RMSD 0.561 A Nr of unique valid plot points : ( 61) Plot file written ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.8 SImilarity_plot (plot RMSD as a function of number of aligned residues)

A single RMSD value tends not to be very informative. Sanchez and Sali suggested to use a similarity curve instead. This command generates such a curve for you, by doing an automatic operator improvement for a range of distance cut-off values (e.g., 0.1 to 10.0 in steps of 0.1). Only the unique values are included in the plot (e.g., if cut-offs between 5.0 and 6.9 A all give the same number of matched residues and the same RMSD, only one "data point" will be included in the plot).

Note: the two chains you compare must already have been superimposed in some way (e.g., using the EXplicit or BRute_force commands) !!

Reference: R Sanchez and A Sali, Proteins, Suppl. 1, 50-58 (1997).

Note: on second thought, this is probably not the type of plot Sali & Sanchez described after all ...

Example of a SImilarity plot.

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > sim m1 a m2 a Command > (sim m1 a m2 a) Plot file name ? (m1_m2_simil.plt) Similarity plot of M1 A And M2 A Atom types | CA | B-factor range used -1000.00 - 10000.00 A2 Cut-off start 0.10 Cut-off end 10.00 Cut-off step 0.10 Nr of steps 100

# 1 Cut-off 0.10 Nmatch -1 RMSD 999.99 # 2 Cut-off 0.20 Nmatch -1 RMSD 999.99 # 3 Cut-off 0.30 Nmatch -1 RMSD 999.99 # 4 Cut-off 0.40 Nmatch 8 RMSD 0.15 # 5 Cut-off 0.50 Nmatch 41 RMSD 0.32 # 6 Cut-off 0.60 Nmatch 83 RMSD 0.38 # 7 Cut-off 0.70 Nmatch 104 RMSD 0.41 # 8 Cut-off 0.80 Nmatch 110 RMSD 0.43 # 9 Cut-off 0.90 Nmatch 125 RMSD 0.47 # 10 Cut-off 1.00 Nmatch 132 RMSD 0.49 ... # 99 Cut-off 9.90 Nmatch 157 RMSD 1.25 # 100 Cut-off 10.00 Nmatch 157 RMSD 1.25 Nr of unique valid plot points : ( 25) Plot file written LSQMAN > $ cat m1_m2_simil.plt Command > ($ cat m1_m2_simil.plt) Spawn system command : ( cat m1_m2_simil.plt) ! Created by LSQMAN V. 981108/7.0.1 at Sun Nov 8 17:55:42 1998 for gerard XLABEL Number of superimposed residues YLABEL RMSD of superimposed residues REMARK Similarity plot (Sanchez & Sali, 1997) REMARK Compared mol M1 = file m13a.pdb REMARK and mol M2 = file 1mup.pdb NPOINT 25 COLOUR 4 XYVIEW 0.00 172.70 0.00 1.38 XVALUE * 8 41 83 104 110 125 132 135 133 135 138 139 140 142 143 144 145 146 147 148 150 151 152 155 157 YVALUE * 0.15 0.32 0.38 0.41 0.43 0.47 0.49 0.51 0.49 0.51 0.55 0.56 0.57 0.61 0.63 0.65 0.67 0.69 0.73 0.76 0.83 0.88 0.94 1.08 1.25 END ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.9 WAters (compare solvent structure in two molecules)

With this command you can compare water molecules in different models or chains of similar or identical molecules.
You supply the names of the two molecules, a distance cut-off for matching water molecules and the name of a scatter-plot file for O2D. The program retrieves all water molecule oxygen atoms (atom names must begin with ' O' and residue names must be one of WAT, HOH, SOL, OHH, HHO, H2O, OH2, H3O, OH3, or EAU) in both molecules. If thereare more than two in both molecules, it will apply the current operator to bring the second molecule on top of the first. Then it will try to find a matching water for each of the waters in the first molecule.
Output consists of a list of matched water molecules, some statistics and a scatter-plot file for O2D with |delta-B| versus distance for the matched water molecules. Ideally, all matched waters should lie in the bottom-left corner of this plot, i.e. small distances and small differences in temperature factors. The plot file can be converted to PostScript with O2D or the "o2dps" script (remember that it's a SCatter plot !).

NOTE:: if you use too large a distance cut-off, some waters in the second molecule may be matched more than once against waters in the first molecule ! A suggested value for the cut-off is ~1-2 A.

Example of a WAter plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ch re
 LSQMAN > re m1 /nfs/pdb/full/5rub.pdb
 LSQMAN > re m2 /nfs/pdb/full/2rus.pdb
 LSQMAN > at ext
 LSQMAN > ex m1 a1-999 m2 a1
 ...
 Atom types     | CA | N  | C  | O  | CB |
 Nr of atoms to match : (       2128)
 The   2128 atoms have an RMS distance of    0.545 A
 RMS delta B  =    4.905 A2
 Corr. coeff. =      0.9261
 ...
 LSQMAN > im m1 a* m2 a*
 LSQMAN > sh m1 m2
 LSQMAN > waters m1 m2
 Distance cut-off (A) ? (1.50)
 Plot file ? (m1_m2_dist_db.plt)
 Water comparison of mol 1 : (M1)
                 and mol 2 : (M2)
 Waters in mol 1 : (        736)
 Waters in mol 2 : (        714)
 Applying current operator to mol 2 : (   1.000    0.003   -0.004   -0.003
     1.000   -0.003    0.004    0.003    1.000   -0.189   -0.236    0.186)
   
     1 HOH-C   4 <-> HOH-C   7 | D   0.20 | B   14.74  14.06
     2 HOH-C   5 <-> HOH-C   8 | D   0.19 | B   21.94  13.00
     3 HOH-C   6 <-> HOH-C   9 | D   0.17 | B   16.73  11.17
     4 HOH-C   8 <-> HOH-C 601 | D   0.30 | B   24.66  30.83
     5 HOH-C  10 <-> HOH-C  13 | D   0.41 | B   25.37  20.04
     6 HOH-C  11 <-> HOH-C  14 | D   0.41 | B   25.49  44.41
 ...
   241 HOH-C 720 <-> HOH-C 311 | D   0.83 | B   73.33  57.27
   242 HOH-C 727 <-> HOH-C 553 | D   0.90 | B   62.94  59.38
   243 HOH-C 730 <-> HOH-C 713 | D   0.51 | B   48.13  32.37
   244 HOH-C 733 <-> HOH-C 341 | D   0.77 | B   59.37  48.07
   
 Nr of matched waters   : (        244)
 % matched waters mol 1 : (  33.152)
 % matched waters mol 2 : (  34.174)
 Average B of non-matched waters in mol 1 : (  49.714)
 Average B of matched waters in mol 1     : (  38.196)
 Average B of matched waters in mol 2     : (  37.940)
 RMS distance (A) : (   0.796)
 RMS delta-B (A2) : (  10.949)
   Matching distances :
 Average :    0.70   St. dev. :    0.39
 Minimum :    0.05   Maximum  :    1.49
       Delta-B values :
 Average :    8.37   St. dev. :    7.06
 Minimum :    0.09   Maximum  :   37.02
 Correlation coefficient : (   0.274)
 Plot file written
 CPU total/user/sys :       1.1       0.9       0.1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by LSQMAN V. 950224/3.2 at Sat Feb 25 02:50:29 1995 for user gerard
XLABEL Distance (A) between matched water molecules
YLABEL |Delta-B| (A2) of matched water molecules
REMARK Scatter plot of distance and abs(delta-B) of matched water molecules in two models or chains
REMARK Waters are those of mol 1 = M1 = file /nfs/pdb/full/5rub.pdb
REMARK Matched waters are from mol 2 = M2 = file /nfs/pdb/full/2rus.pdb
REMARK Distances ave/sdv/min/max =     0.70    0.39    0.05    1.49
REMARK |Delta-B| ave/sdv/min/max =     8.37    7.06    0.09   37.02
REMARK Correlation coefficient =     0.27
REMARK Cut-off distance used (A) =     1.50
REMARK Waters in mol 1, mol 2, matched      736     714     244
XYVIEW     0.00    1.60    0.00   38.00
NPOINT      244
COLOUR        4
XVALUE *
    0.20    0.19    0.17    0.30    0.41    0.41    0.52    0.57    0.60
...
    7.99   22.04   16.04    0.09    5.96    2.61   16.06    3.56   15.76
   11.30
END
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

You can also use this option to see how many waters obey the NCS, if present, and how well they do this. Simply do an explicit fit of two chains and run the WAters command. In the following example (PDB code 5RUB), only ~40 % of the waters have an "NCS-mate" within 1.5 A.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ex m1 a1-999 m1 b1
 LSQMAN > im m1 a* m1 b*
 LSQMAN > wa m1 m1
 WARNING - mol1 == mol2 !
 Distance cut-off (A) ? (1.50)
 Plot file ? (m1_m1_dist_db.plt)
 Water comparison of mol 1 : (M1)
                 and mol 2 : (M1)
 Waters in mol 1 : (        736)
 Waters in mol 2 : (        736)
 Applying current operator to mol 2 : (   0.390   -0.071    0.918   -0.045
    -0.997   -0.058    0.920   -0.019   -0.392    5.419   17.580   -7.606)
   
     1 HOH-C   3 <-> HOH-C  59 | D   0.81 | B   41.96  39.42
     2 HOH-C   4 <-> HOH-C  32 | D   0.44 | B   14.74  15.61
     3 HOH-C   5 <-> HOH-C 116 | D   0.65 | B   21.94  23.09
     4 HOH-C   6 <-> HOH-C 115 | D   0.70 | B   16.73  13.64
 ...
   298 HOH-C 695 <-> HOH-C 592 | D   1.41 | B   50.81  69.92
   299 HOH-C 699 <-> HOH-C 486 | D   0.81 | B   61.57  41.81
   300 HOH-C 709 <-> HOH-C 368 | D   1.21 | B   37.86  35.01
   301 HOH-C 724 <-> HOH-C 456 | D   0.75 | B   57.56  46.63
   
 Nr of matched waters   : (        301)
 % matched waters mol 1 : (  40.897)
 % matched waters mol 2 : (  40.897)
 Average B of non-matched waters in mol 1 : (  50.817)
 Average B of matched waters in mol 1     : (  38.783)
 Average B of matched waters in mol 2     : (  37.359)
 RMS distance (A) : (   0.949)
 RMS delta-B (A2) : (  11.968)
   Matching distances :
 Average :    0.88   St. dev. :    0.36
 Minimum :    0.09   Maximum  :    1.50
       Delta-B values :
 Average :    8.70   St. dev. :    8.22
 Minimum :    0.05   Maximum  :   46.68
 Correlation coefficient : (   0.142)
 Plot file written
 CPU total/user/sys :       1.1       1.0       0.2
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.10 HIsto_disto (histogram of distances)

Yet another way to assess the similarity of two molecules. In this case, for every atom in molecule #2, the distance to the nearest atom in molecule #1 (after applying the current operator) is calculated, and a histogram is printed. You have to supply a maximum distance and the bin size for the histogram.
This has the advantage that there doesn't have to be a 1:1 correspondence in the atoms compared (so you can compare a correct and a backwards traced model, for instance). It has the same disadvantages as all other distance-based measures (sensitive to operator, poor statistic when domain shifts occur).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > hi m1 m2 50 0.25
 Distance histogram of mol 1 : (M1)
                   and mol 2 : (M2)
 Applying current operator to mol 2 : (  -0.592   -0.646    0.482   -0.645
     0.021   -0.764    0.483   -0.763   -0.429  169.723  133.330   34.824)
   
 Nr of matched atoms   : (       3092)
 Nr of atoms in mol 2  : (       3092)
 % matched atoms mol 2 : ( 100.000)
 RMS distance (A) : (   1.525)
   
 Matching distances :
 Average :    1.22   St. dev. :    0.91
 Minimum :    0.01   Maximum  :    9.36
   
       86 in [  0.00-  0.25> =   2.78% (Cumul   2.78 %)
      382 in [  0.25-  0.50> =  12.35% (Cumul  15.14 %)
      635 in [  0.50-  0.75> =  20.54% (Cumul  35.67 %)
      513 in [  0.75-  1.00> =  16.59% (Cumul  52.26 %)
      359 in [  1.00-  1.25> =  11.61% (Cumul  63.87 %)
      305 in [  1.25-  1.50> =   9.86% (Cumul  73.74 %)
 ...
        1 in [  8.00-  8.25> =   0.03% (Cumul  99.97 %)
        1 in [  9.25-  9.50> =   0.03% (Cumul 100.00 %)
 CPU total/user/sys :      18.0      18.0       0.1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

A somewhat better result:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 ...
 NOT MATCHED :CA    CA-C 440
 NOT MATCHED : O   HOH-C 441
   
 Nr of matched atoms   : (       3518)
 Nr of atoms in mol 2  : (       7038)
 % matched atoms mol 2 : (  49.986)
 RMS distance (A) : (   0.160)
   
 Matching distances :
 Average :    0.11   St. dev. :    0.12
 Minimum :    0.00   Maximum  :    2.50
   
     2227 in [  0.00-  0.10> =  63.30% (Cumul  63.30 %)
      945 in [  0.10-  0.20> =  26.86% (Cumul  90.16 %)
      216 in [  0.20-  0.30> =   6.14% (Cumul  96.30 %)
       64 in [  0.30-  0.40> =   1.82% (Cumul  98.12 %)
       24 in [  0.40-  0.50> =   0.68% (Cumul  98.81 %)
       10 in [  0.50-  0.60> =   0.28% (Cumul  99.09 %)
        8 in [  0.60-  0.70> =   0.23% (Cumul  99.32 %)
        6 in [  0.70-  0.80> =   0.17% (Cumul  99.49 %)
        3 in [  0.80-  0.90> =   0.09% (Cumul  99.57 %)
        5 in [  0.90-  1.00> =   0.14% (Cumul  99.72 %)
        2 in [  1.00-  1.10> =   0.06% (Cumul  99.77 %)
        1 in [  1.10-  1.20> =   0.03% (Cumul  99.80 %)
        4 in [  1.20-  1.30> =   0.11% (Cumul  99.91 %)
        2 in [  1.50-  1.60> =   0.06% (Cumul  99.97 %)
        1 in [  2.40-  2.50> =   0.03% (Cumul 100.00 %)
 CPU total/user/sys :      95.2      93.8       1.4
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

17.11 D1_D2 (delta-1, delta-2 plot)

This plots differences in Delta-1 and Delta-2 torsion angles between two molecules or chains. They are defined as follows:
- Delta-1 = delta (PHI(i+1) - PSI(i)) mol 1 and 2
- Delta-2 = delta (PHI(i+1) + PSI(i)) mol 1 and 2
These plots should be less noisy than delta-Phi/delta-Psi plots in cases where differences between molecules are due to coupled torsions. I'm not convinced, but Alwyn thought they would be useful (they were suggested by a referee of a submitted paper about analysis and use of NCS).

Example of a D1_D2 plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > d1 m1 a1-999 m1 b1 cbh1_d1d2.plt
 WARNING - mol1 == mol2 !
 Delta-1/Delta-2 plot
 Plot of M1 A1-999
 And     M1 B1
 Nr of residues matched : (        434)
 RMS delta Delta-1   : (   2.137)
 Average |delta D-1| : (   1.455)
 Nr |delta D-1| > 10 : (       1)
 Percentage          : (   0.230)
 RMS delta Delta-2   : (   6.046)
 Average |delta D-2| : (   4.386)
 Nr |delta D-2| > 10 : (      37)
 Percentage          : (   8.525)
 Plot file written
 CPU total/user/sys :       1.1       1.0       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > d1 m2 a1-999 m2 b1 3sdp_d1d2.plt
 WARNING - mol1 == mol2 !
 Delta-1/Delta-2 plot
 Plot of M2 A1-999
 And     M2 B1
 Nr of residues matched : (        186)
 RMS delta Delta-1   : (  57.323)
 Average |delta D-1| : (  40.262)
 Nr |delta D-1| > 10 : (     144)
 Percentage          : (  77.419)
 RMS delta Delta-2   : (  78.515)
 Average |delta D-2| : (  60.437)
 Nr |delta D-2| > 10 : (     160)
 Percentage          : (  86.022)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

18 MORPHING

18.1 MOrph (morph transition between two conformational states)

This command enables you to produce a set of models that show the transition between two conformational states. Examples include: a rigid-body domain motion (e.g., the open and closed conformation of ribose-binding protein, 1URP and 2DRI), the open and closed lid of lipase (1CRL and 1TRH), a loop movement, etc. Using these models you can animate the transition in O (or other programs), produce animated GIFs, etc.

Morphing can be done in two different ways, namely in internal and in Cartesian coordinate space. Internal coordinate space is usually to be preferred, except when any of the torsion angles changes to a very large extent (e.g., more than 60 degrees). In internal coordinate space, the central atom trace (central atom would be CA for proteins, but can be set to anything using the ATom_type command in LSQMAN) is represented as follows:

CA1---CA2---CA3---CA4

the coordinates for CA4 would be:
- 1 = distance (CA4,CA3)
- 2 = angle (CA4,CA3,CA2)
- 3 = torsion angle (CA4,CA3,CA2,CA1)

To morph the transtion from state A to state B, LSQMAN calculates the internal coordinates for each state, and then generates the intermediate values by simple linear interpolation. For instance, if the torsion angle (CA4,CA3,CA2,CA1) changes from 34 to 61 degrees, and if there are a total of 10 models (including start and end models), model 1 would have a torsion of 34 degrees. The torsion range is 27 degrees (for 9 more models), so model 2 would have a torsion of 34+(27/9) = 37 degrees, model 3 40 degrees, etc.

If one of the torsions changes a lot, this means that all residues C-terminal of it will swing with it, which may lead to very strange effects (e.g., try this out on the A and B molecules of PDB entry 1SHF). In those cases, morphing has to be done in Cartesian coordinate space. For this to work, the two molecules (start and end point) *must* be superimposed ! The algorithm is very similar, in that every atom moves in regular steps from its starting point to its end point. The disadvantage of Cartesian space morphing is that the geometry will almost definitely be distorted (in internal space, the distances and angles are usually essentially fixed, e.g. CA-CA distances will be ~3.8 Å with little variation). As a rule, try morphing in internal coordinate space first; if it gives a strange effect, do it in Cartesian space instead.

For internal coordinate morphing, only one atom type can be used (e.g., CA for proteins), and only those atoms will be written to the model PDB files. However, for Cartesian space morphing you may select any set of atoms (e.g., main-chain atoms or all atoms).

Specifically for proteins, if you select the ATom_type TRace, you can morph in internal coordinate space with both the CA trace and the non-hydrogen side-chain atoms. This enables you to visualise which residues change most drastically upon domain closure, ligand or substrate binding, or whatever. Of course, you want to use the NOmenclature and FIx commands to prevent artifactual changes (e.g., Phe-ring flip because of how you named the ring carbons). Usually, trace morphing will show large changes for surface residues, whereas you are probably more interested in the active site or ligand-binding residues. The solution is simple: prepare a PDB file which contains only the main-chain atoms of your protein PLUS those side-chain atoms that are of interest (can be done with a text editor, or with MOLEMAN2, for example). Trace morphing is the most complicated in terms of programming, but I think it should work in most cases (unless, again, there is a large change somewhere in one or more of the virtual CA torsions). It will not work very well for moieties that are not connected to your protein (e.g., waters, ligand), so you may want to exclude those.

PLEASE NOTE: simple morphing does NOT pretend to simulate a favourable pathway for effecting the conformational change ! It is merely a (suggestive) visualisation method !

HINT: if you really want to impress your boss, you could also "morph" the diffusion of a substrate etc. into an active site, binding site, or whatever. Generate your substrate inside the site, and a copy far away, and do a Cartesian morphing. The result will be your substrate floating into the binding site. (If you combine this with an internal morph of a domain motion, e.g. in ribose-binding protein: do the two morphs separately in LSQMAN, and append the morphed ligand to the morphed PDB file of the protein. You may want to reset all "B-factors" of the ligand. And you will want to draw all atoms of the ligand in O.)

The models' PDB files will only contain coordinates for the selected atoms (usually, CA atoms). Their B-factors will be replaced by the magnitude of the change of the torsion around each CA-CA bond, so that if you paint_ramp the CA model in O, the "hot spots" will be coloured towards red, whereas the parts of the structure that do not change much will be blue or green. (In Cartesian space, the B-factors will contain the distance for each central atom between its position in the starting and end model.)

LSQMAN creates a macro for itself. This macro will superimpose all morphing models onto the very first one. By default, the same residue range will be used as that given for molecule 1 in the MOrph command; if this is not appropriate (e.g., for ribose-binding protein, you could use residue 100 to 230 instead and see what happens), you can provide a different range as a parameter to the MOrph command.

LSQMAN also creates three O macros: XXX_read.omac, XXX_morph.omac and XXX_plot.omac . The first will read the models, paint_ramp them by B-factors, and draw the CA trace (or zone). The second contains the instructions to draw one model after the other so as to go from start to end and back to start again. Unfortunately, when O executes a macro, the display is not updated until the macro is finished, so you must cut and paste the instructions from the second macro into your O command window !!! The third macro can be executed after the first, and after you have set a good view, zoom, clip, etc. It will generate one big O plot file with all models (for later rendering).

The parameters to the MOrph command are:
- molecule 1
- residue range 1
- molecule 2
- residue range 2
- the total number of morphing steps (including start and end)
- base name for output files (used for generating file names for the output PDB files, LSQMAN macro and O macros)
- morph type (internal or Cartesian)
- O molecule name suffix (one character; default is "M", but if you want to combine multiple morphs, you can use different characters, e.g. "X" for the protein and "Y" for the substrate)
- superpositioning range (default is same as "residue range 1"); this will be used in the LSQMAN macro to superimpose and improve the operator between all models and the first one; you can use this if you specifically want to superimpose on residues A100-230 since you know that these are rock-solid
- torsion angle range cut-off (default is not to use this facility by providing a number larger than 180 degrees here); if this number is smaller than 180 degrees, and if you do internal coordinate morphing, the program will attempt to replace central atom (CA-CA) bonds with too large torsions by pseudo-bonds to other atoms that yield smaller torsion changes. This sometimes solves the problem of "unfolding proteins" (e.g., molecule A and B from PDB entry 1SHF can be morphed successfully in this fashion), but not always. It can also introduce artefacts (broken bonds in particular). If straightforward internal coordinate morphing produces artefacts, the use of this parameter is worth a try. If it still doesn't work, consider morphing in Cartesian coordinate space. (NOTE: bonds will still be coloured correctly if this option is used, and the O macro will change the maximum allowed central-atom spacing to 15.0 A so as still to connect sequential central atoms that may become too widely apart during some of the morphing.)

For example, for 1URP and 2DRI (ribose-binding protein) we will trace morph the transition between the closed (ribose-bound) and open conformation:

(1) Only select the main-chain atoms plus those side-chain atoms that are near the ligand of 2DRI (using MOLEMAN2):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 MOLEMAN2 > re /nfs/pdb/full/2dri.pdb
 MOLEMAN2 > sel none
 MOLEMAN2 > sel or res rip
 MOLEMAN2 > select dist 0.0 8.0
 MOLEMAN2 > sel by_residue
 MOLEMAN2 > sel or class main
 MOLEMAN2 > sel and type prot
 MOLEMAN2 > wr 2dri.pdb pdb selected
 MOLEMAN2 > quit
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(2) Do the trace morphing in LSQMAN (don't forget to fix the ambiguous side-chain torsions with the NOmenclature and FIx commands !):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1urp.pdb
 ...
 LSQMAN > re m2 2dri.pdb
 ...
 LSQMAN > nomen m1
 Command > (nomen m1)
 Enforce proper nomenclature for : (M1)
 Nr of atoms    : (       2002)
 Nr of residues : (        271)
   
 # of PHE checked :     7 # errors :     0
 # of TYR checked :     3 # errors :     0
 # of ASP checked :    21 # errors :     0
 # of GLU checked :    11 # errors :     0
 # of ARG checked :     6 # errors :     0
 No problem, mon !
 LSQMAN > fix m1 a1-999 m2 a1 strict seq torsion
 Command > (fix m1 a1-999 m2 a1 strict seq torsion)
 Reference atoms M1 A1-999
 Fix atoms for   M2 A1
 Only fix Asp/Glu/Arg/Phe/Tyr
 Use sequential residues (1:1 correspondence)
 Minimise torsion-angle differences
 Minimum improvement : (   0.010)
   
 Zone : (          1)
 Fix sidechain of PHE-A-  15 (  164.38 versus    17.20)
 Fix sidechain of ASP-A-  52 (   91.24 versus    88.01)
 Fix sidechain of GLU-A- 143 (   93.35 versus    85.79)
 Fix sidechain of GLU-A- 246 (  152.36 versus    26.93)
 Fix sidechain of GLU-A- 255 (  112.10 versus    68.33)
   
 Residues checked : (         48)
 Residues fixed   : (          5)
 LSQMAN > at trace
 Command > (at trace)
 Nr of atom types : (       1)
 Types : ( TRAC)
 LSQMAN > morph m1 a1-999 m2 a1
 Command > (morph m1 a1-999 m2 a1)
 Number of steps ? (          10) 25
 Basename for output PDB files ? (morphy) trace
 Morph type (Internal/Cartesian) ? (internal)
 O mol name suffix ? (m)
 Superpositioning range ? (A1-999) a100-230
 Torsion range cut-off ? (   999.0000)
   
 Morph in internal coordinate space
 Atom type(s) : ( TRAC)
 Morph CA and side-chain atoms
 Nr of atoms matched : (        429)
 RMSD matching atoms : (  47.653)
 Nr of CA atoms : (        271)
 Nr of side-chain atoms : (        158)
   
 B-factors replaced by torsion angle differences:
 Average : (   7.059)
 St.dev. : (  14.452)
 Minimum : (   0.012)
 Maximum : ( 150.835)
   
 Morphing ...
 Creating LSQMAN macro : (trace.lsqmac)
 Creating first O macro : (trace_read.omac)
 Morph step number : (          1)
 Morph step number : (          2)
 Morph step number : (          3)
 ...
 Morph step number : (         25)
 Creating second O macro : (trace_morph.omac)
   
 Morphing done ... execute the new LSQMAN
 macro to superimpose the morphed models !
 CPU total/user/sys :       2.0       1.9       0.1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Note how little CPU time this operations requires (R10000) !

(3) Execute the LSQMAN macro to superimpose all morphing models on the first model:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > @trace.lsqmac
 ...
 LSQMAN > quit
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(4) Now start a fresh O session; execute the macro that reads all models (@trace_read.omac). Then cut and paste instructions from the other macro (trace_morph.omac) into your O command window and impress your friends !!!

19 SUPERIMPOSING AND COMPARING MULTIPLE MOLECULES

19.1 MCentral (find central NCS or NMR structure)

Find the "central" structure in a molecule which contains multiple (NCS) chains or (NMR) models. The model which has the lowest value for RMS (RMSD) is considered to be the central one. You are to supply the molecule name, residue range (only one, and without a chain/model name !!) and the method to use (Explicit alignment only, or explicit followed by Improvement of the alignment). Note that all features of the EX and IM commands can be used here as well (e.g., ATom type selection, different SEttings, etc.).

Note: some pairs of chains cannot be compared, namely if either of them contains fewer than 3 atoms, or if their total number of atoms differs by more than 25%, or if their RMSD ends up being greater than 900 A (e.g., because one is protein and the other DNA).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 1ldn.pdb
 HEADER :     OXIDOREDUCTASE(CHOH(D)-NAD(A))          19-NOV-91   1LDN      1LDN   2
 AUTHOR :     D.B.WIGLEY,S.J.GAMBLIN,J.P.TURKENBURG,E.J.DODSON,K.PIONTEK,   1LDN   6
 AUTHOR :    2 H.MUIRHEAD,J.J.HOLBROOK                                      1LDN   7
 REVDAT :    1   31-JAN-94 1LDN    0                                        1LDN   8
 CRYST1 :    84.900  118.200  135.500  90.00  96.07  90.00 P 21         16  1LDN 320
 Old chain |A| becomes chain A
 ...
 Old chain |H| becomes chain H
 Nr of lines read from file : (      21617)
 Nr of atoms in molecule    : (      20576)
 Nr of chains or models     : (          8)
 Stripped hydrogen atoms    : (          0)
 Nr of HETATMs              : (       1132)
 Stripped alt. conf. atoms  : (        140)
 CPU total/user/sys :       2.8       2.8       0.1
 LSQMAN > at ca
 Nr of atom types : (       1)
 Types : (  CA)
 LSQMAN > set sim
 Setting defaults for 2 A fit of similar molecules
 LSQMAN > mc m1 1-999 imp 1ldn_rt.odb
 Find central chain/model
 Save operators in file : (1ldn_rt.odb)
   
 Aligning B to A
 The    313 atoms have an RMS distance of    0.369 A
 SI = RMS * Nmin / Nmatch             =      0.37286
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.83614
 CR = Maiorov-Crippen RHO (0-2)       =      0.01919
 RR = relative RMSD                   =      0.01782
 NR = normalised RMSD (100)           =      0.235 A
 SAS(1) = cRMS * (100/Nmatch)         =      0.118 A
 SAS(2) = cRMS * (100/Nmatch)^2       =      0.038 A
 SAS(3) = cRMS * (100/Nmatch)^3       =      0.012 A
 SAS(4) = cRMS * (100/Nmatch)^4       =      0.004 A
 RMSD / Nalign                        =    0.00118 A
 RMS delta B for matched atoms        =    14.188 A2
 Corr. coefficient matched atom Bs    =        0.277
 Rotation     :  -0.81788409  0.21920536 -0.53199112
                  0.23463403 -0.71715492 -0.65622830
                 -0.52536881 -0.66154194  0.53511661
 Translation  :      61.0108      2.1554     22.1060
 Datablock name : (.lsq_rt_m1b_to_m1a)
   
 ...
   
 Chain/model A - RMSDs (A) to the others:
     0.000     0.369     0.396     0.377     0.371     0.328     0.347
     0.354
 RMS(RMSD) for chain/model A =    0.364
   
 ...
   
 Chain/model H - RMSDs (A) to the others:
     0.354     0.324     0.336     0.319     0.326     0.324     0.319
     0.000
 RMS(RMSD) for chain/model H =    0.329
   
 ==> Central chain is G
   
 Average RMSD between chains =    0.338 A
   
 CPU total/user/sys :      30.6      30.6       0.1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

If you requested that all operators be saved to a file, it will look something like this:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
! Created by LSQMAN V. 070804/9.7.5 at Fri Jul 6 20:06:49 2007 for gerard
.lsq_rt_m1b_to_m1a r 12 (3f15.7)
     -0.8178841      0.2346340     -0.5253688
      0.2192054     -0.7171549     -0.6615419
     -0.5319911     -0.6562283      0.5351166
     61.0108032      2.1554465     22.1060314
.lsq_rt_m1c_to_m1a r 12 (3f15.7)
     -0.5160082      0.7203225      0.4635418
      0.7116343      0.0592924      0.7000436
      0.4767727      0.6911005     -0.5432016
     37.7960091    -31.2057724      8.2992077
[...]
.lsq_rt_m1h_to_m1g r 12 (3f15.7)
     -0.9032153     -0.1280665      0.4096355
     -0.1242063     -0.8356048     -0.5351052
      0.4108224     -0.5341945      0.7388242
    107.5465012    167.6539001     26.0704231
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.2 MAlign (align multiple NCS/NMR models)

Align all chains/models in a molecule to one particular one (e.g., to the central structure as determined with the MCentral option).

You are to supply the molecule name, residue range (only one, and without a chain/model name !!), the method to use (Explicit alignment only, or explicit followed by Improvement of the alignment), and the name of the chain/model to which all others should be aligned. Note that all features of the EX and IM commands can be used here as well (e.g., ATom type selection, different SEttings, etc.).

Note: some pairs of chains cannot be aligned, namely if either of them contains fewer than 3 atoms, or if their total number of atoms differs by more than 25%, or if their RMSD ends up being greater than 900 A (e.g., because one is protein and the other DNA).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ma m1 1-999 im g
 Align all to one chain/model
 Chain/model A - RMSD (A) to the selected one =    0.347
 Applying operator to chain ...
 Chain/model B - RMSD (A) to the selected one =    0.335
 Applying operator to chain ...
 ...
 Chain/model H - RMSD (A) to the selected one =    0.319
 Applying operator to chain ...
 CPU total/user/sys :      12.7      12.7       0.0
 LSQMAN > wr m1 ../1ldn_aligned.pdb
 Number of atoms written : (      20716)
 CPU total/user/sys :       6.6       5.3       1.3
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.3 MDihedral (multiple-model phi, psi analysis for NCS/NMR models)

Analyse PHI and PSI dihedral angle distributions in multiple models or chains. For every residue the average, sigma, minimum and maximum value for PHI and PSI are listed (some black magic ensures that the program knows that -179 degrees is close to +173 degrees). Residues with large sigmas or ranges are very different in the various models.
A plot file of SIGMA(phi) and SIGMA(psi) as a function of residue number is also provided. Use this option to analyse how similar your NCS-related molecules or NMR models are.

Example of a MDihedral plot.

A good and a poor example are given below:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > md m1 a q1.plt 10
 Command > (md m1 a q1.plt 10)
 Multiple chain/model dihedral analysis
 Reference chain : (A)
 Residue range :     1 -   764
 Cut-off for printing : (  10.000)
   Residue     <phi> S(phi)    Min    Max     <psi> S(psi)    Min    Max    #phi #psi    V(phi) V(psi)
 SER    99 |  -126.4    7.7 -134.0 -118.7 |  -121.4    9.3 -130.7 -112.1 |     2    2 |   0.009  0.013
 ALA   100 |   -78.2    9.7  -87.9  -68.6 |   -20.4    5.7  -26.1  -14.7 |     2    2 |   0.014  0.005
 GLY   308 |    75.8    0.4   75.4   76.1 |     2.8    5.1   -2.3    7.9 |     2    2 |   0.000  0.004
 GLY   402 |    77.3    6.3   71.1   83.6 |    12.4    5.8    6.6   18.2 |     2    2 |   0.006  0.005
 GLY   434 |   -94.1   11.1 -105.3  -83.0 |     0.0    0.0    0.0    0.0 |     2    0 |   0.019 -1.000
 Nr of residues found : (        434)
 Nr of residues shown : (          5)
 SIGMA(phi) Ave, Sdv, Min, Max, # :     1.19    1.17    0.00   11.13     433
 RANGE(phi) Ave, Sdv, Min, Max, # :     2.37    2.34    0.00   22.25     433
 SIGMA(psi) Ave, Sdv, Min, Max, # :     1.13    1.04    0.00    9.32     433
 RANGE(psi) Ave, Sdv, Min, Max, # :     2.26    2.09    0.00   18.64     433
 CV(phi)    Ave, Sdv, Min, Max, # :    0.000   0.001   0.000   0.019     433
 CV(psi)    Ave, Sdv, Min, Max, # :    0.000   0.001   0.000   0.013     433
 Plot file written
 CPU total/user/sys :       2.4       2.4       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.4 MBfactors (compare temperature factors for NCS models)

Analyse B-factor distributions for some atom type (e.g., CA) in multiple models or chains. For every residue the average, sigma, minimum and maximum value for B is listed.
A plot file of SIGMA(B) and RANGE(B) as a function of residue number is also provided. Use this option to analyse how similar your NCS-related molecules are.

Example of a MBfactors plot.

A good and a poor example are given below:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > mb m1 a q7.plt 5.0
 Command > (mb m1 a q7.plt 5.0)
 Multiple chain/model B-factor analysis
 Reference chain : (A)
 Residue range :     1 -   764
 Central atom type : ( CA)
 Cut-off for printing : (   5.000)
 PRO    12 |     14.1     2.8    11.3    16.9 |   2
 ALA    77 |     16.3     3.3    13.1    19.6 |   2
 GLY   240 |     24.1     3.0    21.1    27.1 |   2
 ASP   241 |     32.6     3.2    29.4    35.8 |   2
 GLY   242 |     28.3     2.9    25.4    31.2 |   2
 GLY   254 |     23.3     3.4    19.9    26.7 |   2
 LYS   287 |     14.4     2.5    11.9    17.0 |   2
 GLU   337 |     23.0     2.8    20.2    25.7 |   2
 SER   400 |     12.0     2.8     9.2    14.8 |   2
 Nr of residues found : (        434)
 Nr of residues shown : (          9)
 SIGMA(B) Ave, Sdv, Min, Max :     0.90    0.64    0.00    3.37
 RANGE(B) Ave, Sdv, Min, Max :     1.79    1.28    0.00    6.74
 Plot file written
 CPU total/user/sys :       1.4       1.4       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.5 MRamachandran (multiple-model Ramachandran plot for NCS/NMR models)

Generate a Ramachandran plot for multiple models/chains. Every Gly is plotted as a square, all other residues as plus signs. In addition, every residue is connected by a line to the average phi/psi point for that residue. All angles have been mapped to the range -180 to +180 degrees, so that residues near an edge of the plot may give rise to a few very long lines.
If you add P(olar) as the last parameter, you will get a plot in a polar coordinate frame.

Example of a normal MRamachandran plot.

Example of a polar MRamachandran plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > mr m2 a q2.ps 150 c
 Command > (mr m2 a q2.ps 150 c)
 Multiple Ramachandran plot
 Reference chain : (A)
 Residue range :     5 -   191
 Cut-off for printing : ( 150.000)
 => XPS_GRAF - GJK (19981216/3.1.2)
 Opened PostScript file : (q2.ps)
 Date    : (Wed Nov 22 19:30:41 2000)
 User    : (gerard)
 Program : (LSQMAN)
   Residue     <phi> S(phi)    Min    Max     <psi> S(psi)    Min    Max    #phi #psi    V(phi) V(psi)
 ALA    10 |  -138.2   77.6 -215.8  -60.5 |    61.1   86.3  -25.2  147.4 |     2    2 |   0.786  0.936
 ALA    13 |  -124.7   72.5 -197.2  -52.2 |  -156.7   87.3 -244.0  -69.5 |     2    2 |   0.700  0.952
 GLN    15 |    93.7   74.6   19.1  168.2 |    16.2   89.1  -72.9  105.2 |     2    2 |   0.734  0.984
 HIS    31 |   -92.0   33.3 -125.4  -58.7 |   -45.8   77.7 -123.4   31.9 |     2    2 |   0.164  0.786
 TYR    34 |    -1.0   77.1  -78.1   76.1 |   -15.0   50.3  -65.3   35.4 |     2    2 |   0.776  0.362
 GLU    49 |   -32.4   54.9  -87.2   22.5 |    26.6   81.7  -55.1  108.2 |     2    2 |   0.424  0.855
 GLY    50 |   173.8   79.6   94.2  253.4 |   139.2   46.1   93.1  185.3 |     2    2 |   0.819  0.307
 VAL    57 |  -130.1   64.7 -194.8  -65.4 |  -127.7   81.5 -209.2  -46.2 |     2    2 |   0.573  0.852
 LYS    58 |     7.5   88.5  -81.0   95.9 |   -47.4    8.0  -55.4  -39.4 |     2    2 |   0.973  0.010
 SER    61 |   -69.7   49.0 -118.7  -20.7 |  -179.3   83.3 -262.6  -95.9 |     2    2 |   0.344  0.884
 GLY    87 |  -140.6   80.7 -221.3  -59.9 |    94.6   19.8   74.7  114.4 |     2    2 |   0.839  0.059
 GLN    88 |   -77.9   79.3 -157.2    1.5 |     8.4   88.9  -80.6   97.3 |     2    2 |   0.815  0.982
 SER   121 |   151.6   77.6   74.0  229.2 |   154.8    2.6  152.2  157.5 |     2    2 |   0.785  0.001
 SER   137 |   -39.4   84.5 -123.9   45.1 |    41.4    5.2   36.2   46.6 |     2    2 |   0.904  0.004
 GLY   142 |   -62.4   79.3 -141.8   16.9 |   166.9   59.7  107.3  226.6 |     2    2 |   0.815  0.495
 GLY   148 |  -175.9   87.3 -263.2  -88.5 |    24.4    3.1   21.3   27.5 |     2    2 |   0.954  0.001
 PRO   151 |   -80.4   24.9 -105.4  -55.5 |  -154.8   77.3 -232.2  -77.5 |     2    2 |   0.093  0.781
 LEU   152 |    -3.7   85.1  -88.7   81.4 |    43.0   83.7  -40.7  126.8 |     2    2 |   0.914  0.891
 TYR   166 |    31.0    3.9   27.1   34.9 |    19.6   84.8  -65.2  104.4 |     2    2 |   0.002  0.909
 ARG   167 |   -10.4   80.9  -91.3   70.6 |    40.0   60.3  -20.3  100.3 |     2    2 |   0.842  0.505
 ASN   168 |  -177.7   87.0 -264.6  -90.7 |    84.4   19.1   65.3  103.5 |     2    2 |   0.947  0.055
 Nr of residues found : (        186)
 Nr of residues shown : (         21)
 SIGMA(phi) Ave, Sdv, Min, Max, # :    24.95   23.99    0.05   88.46     185
 RANGE(phi) Ave, Sdv, Min, Max, # :    49.89   47.99    0.11  176.92     185
 SIGMA(psi) Ave, Sdv, Min, Max, # :    25.83   24.61    0.05   89.07     185
 RANGE(psi) Ave, Sdv, Min, Max, # :    51.65   49.23    0.11  178.14     185
 CV(phi)    Ave, Sdv, Min, Max, # :    0.163   0.253   0.000   0.973     185
 CV(psi)    Ave, Sdv, Min, Max, # :    0.174   0.255   0.000   0.984     185
 PostScript file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.6 MSidechain (multiple-model chi1/chi2 analysis for NCS/NMR models)

Same as MDihedral but for side-chain CHI1/CHI2 torsions.

Example of a MSidechain plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > ms m1 a q4.plt 20
 Command > (ms m1 a q4.plt 20)
 Multiple side-chain torsion analysis
 Reference chain : (A)
 Residue range :     1 -   764
 Cut-off for printing : (  20.000)
   Residue      <c1>  S(c1)    Min    Max      <c2>  S(c2)    Min    Max     #c1  #c2     V(c1)  V(c2)
 ASP    63 |    59.1    1.3   57.7   60.4 |    -0.9   88.6  -89.5   87.7 |     2    2 |   0.000  0.976
 GLN   101 |  -141.3   13.1 -154.4 -128.1 |    89.1    0.3   88.7   89.4 |     2    2 |   0.026  0.000
 PHE   146 |   -76.2    0.5  -76.7  -75.7 |     0.7   89.0  -88.4   89.7 |     2    2 |   0.000  0.983
 TYR   167 |   -69.6    0.2  -69.7  -69.4 |    -1.9   86.1  -88.0   84.3 |     2    2 |   0.000  0.933
 TYR   274 |   178.2    0.1  178.1  178.3 |   179.7   88.6   91.1  268.3 |     2    2 |   0.000  0.975
 PRO   276 |    12.3   19.5   -7.2   31.8 |    -3.3   32.3  -35.6   29.0 |     2    2 |   0.057  0.155
 PHE   280 |   -72.5    0.7  -73.2  -71.8 |     0.5   89.9  -89.4   90.4 |     2    2 |   0.000  0.999
 ASN   315 |   -93.5    2.4  -95.9  -91.1 |    79.4   23.4   56.0  102.8 |     2    2 |   0.001  0.082
 TYR   321 |   172.9    0.3  172.6  173.2 |    -1.4   86.7  -88.2   85.3 |     2    2 |   0.000  0.943
 ASP   328 |   -74.2    6.7  -81.0  -67.5 |   -20.2   17.9  -38.1   -2.2 |     2    2 |   0.007  0.049
 ASP   345 |   -77.6    4.5  -82.1  -73.0 |   -31.7   10.0  -41.7  -21.6 |     2    2 |   0.003  0.015
 ASN   413 |   -74.4    6.2  -80.6  -68.2 |    24.5   18.2    6.4   42.7 |     2    2 |   0.006  0.050
 Nr of residues found : (        434)
 Nr of residues shown : (         12)
 SIGMA(chi1) Ave, Sdv, Min, Max, # :     1.34    1.66    0.01   19.52     356
 RANGE(chi1) Ave, Sdv, Min, Max, # :     2.68    3.31    0.02   39.04     356
 SIGMA(chi2) Ave, Sdv, Min, Max, # :     4.53   14.50    0.01   89.91     218
 RANGE(chi2) Ave, Sdv, Min, Max, # :     9.02   28.94    0.00  179.83     219
 CV(chi1)    Ave, Sdv, Min, Max, # :    0.001   0.003   0.000   0.057     356
 CV(chi2)    Ave, Sdv, Min, Max, # :    0.029   0.158   0.000   0.999     219
 Plot file written
 CPU total/user/sys :       4.7       4.7       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.7 MTorsion (multiple-model chi1, chi2 plot for NCS/NMR models)

Same as MRamachandran but for side-chain CHI1/CHI2 torsions.
If you add P(olar) as the last parameter, you will get a plot in a polar coordinate frame.

Example of a normal MTorsion plot.

Example of a polar MTorsion plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > mt m2 a q5.ps 150 c
 Command > (mt m2 a q5.ps 150 c)
 Multiple torsion plot
 Reference chain : (A)
 Residue range :     5 -   191
 Cut-off for printing : ( 150.000)
 => XPS_GRAF - GJK (19981216/3.1.2)
 Opened PostScript file : (q5.ps)
 Date    : (Wed Nov 22 19:32:35 2000)
 User    : (gerard)
 Program : (LSQMAN)
   Residue      <c1>  S(c1)    Min    Max      <c2>  S(c2)    Min    Max     #c1  #c2     V(c1)  V(c2)
 HIS    17 |   -10.3   46.9  -57.2   36.5 |    -7.3   80.0  -87.3   72.6 |     2    2 |   0.316  0.826
 TYR    25 |    23.3   83.1  -59.8  106.4 |   -64.9   13.1  -77.9  -51.8 |     2    2 |   0.880  0.026
 ASP    28 |   -83.7   23.8 -107.5  -59.9 |     0.7   81.6  -80.9   82.3 |     2    2 |   0.085  0.854
[...]
 ASN   182 |   -80.5    5.4  -85.9  -75.2 |  -138.9   78.0 -216.9  -61.0 |     2    2 |   0.004  0.791
 VAL   186 |    39.0   88.6  -49.6  127.6 |     0.0    0.0    0.0    0.0 |     2    0 |   0.975 -1.000
 GLU   189 |   -75.7   14.0  -89.8  -61.7 |    18.9   88.3  -69.3  107.2 |     2    2 |   0.030  0.969
 Nr of residues found : (        186)
 Nr of residues shown : (         25)
 SIGMA(chi1) Ave, Sdv, Min, Max, # :    31.62   25.88    0.20   88.61     151
 RANGE(chi1) Ave, Sdv, Min, Max, # :    63.25   51.76    0.40  177.22     151
 SIGMA(chi2) Ave, Sdv, Min, Max, # :    37.00   28.17    0.01   88.25     113
 RANGE(chi2) Ave, Sdv, Min, Max, # :    74.01   56.33    0.02  176.50     113
 CV(chi1)    Ave, Sdv, Min, Max, # :    0.228   0.274   0.000   0.976     151
 CV(chi2)    Ave, Sdv, Min, Max, # :    0.291   0.314   0.000   0.969     113
 PostScript file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.8 MPlot (multi-RMS (distance) plot for multiple models)

This command will plot the multi-RMS (distance) of central atoms as a function of residue number. The multi-RMS is simply the RMS value of the distances between all unique pairs of central atoms belonging to one particular residue. So, if you have 3 superimposed chains or models, and the central atom is CA, then:
multi-RMS = SQRT ( [d12**2 + d13**2 + d23**2] / 3 )

This kind of plot will reveal regions where there is a lot of structural variation in Cartesian space (not necessarily in torsion angle space, e.g. in the case of domain movements).

Note: don't forget to superimpose the models first, e.g. using the MCentral and MAlign commands !

Example of an MPlot plot.

From version 8.6 on, this command also produces a so-called "CD plot" (as a PostScript file). CD plots were first introduced in the evaluation of the comparative modelling part of CASP3:

Jones, T.A. and Kleywegt, G.J. (1999). CASP3 comparative modelling evaluation. Proteins: Struct. Funct. Genet. Suppl. 3, 30-46.

Consider the following example:

Example of an MPlot "CD plot".

This is the CD plot for 1LDN (after the 8 chains had been superimposed onto the central chain). The bottom band (looks a bit like a gel - yeuch) shows the multi-RMS again, but now the values are mapped between to a grey scale that runs from white to black. White maps to a distance of 0.0 Å and black maps to a (user-definable) maximum distance (default is 3.5 Å) - distances in between these values are mapped to corresponding grey tones by linear interpolation. The other bands show the distances between corresponding central atoms (for proteins typically CA atoms) in pairs of chains. Again, large distances will show up as dark regions.

CD plots are very useful for analysing an ensemble of structures. In the example of 1LDN, one notices immediately that chain A in the region 103-105 has a conformation which is different from that of all the other chains. Similarly, chain C differs from all other chains around residue 220. The C-terminus (residue 330) occurs in two conformations: chains A and G on the one hand, and chains B, C, D, E, F, and H on the other.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > read m1 1ldn.pdb
 [...]
 LSQMAN > mcentral m1 1-999 explicit
 [...]
 ==> Central chain is D
   
 Average RMSD between chains =    0.442 A
 LSQMAN > malign m1 1-999 explicit D
 Align all to one chain/model
 Chain/model A - RMSD (A) to the selected one =    0.497
 Applying operator to chain ...
 Chain/model B - RMSD (A) to the selected one =    0.329
 Applying operator to chain ...
 Chain/model C - RMSD (A) to the selected one =    0.539
 Applying operator to chain ...
 Chain/model E - RMSD (A) to the selected one =    0.322
 Applying operator to chain ...
 Chain/model F - RMSD (A) to the selected one =    0.324
 Applying operator to chain ...
 Chain/model G - RMSD (A) to the selected one =    0.405
 Applying operator to chain ...
 Chain/model H - RMSD (A) to the selected one =    0.319
 Applying operator to chain ...
 CPU total/user/sys :       7.3       7.3       0.0
 LSQMAN > mp m1 a m1_multi_rms_dist.plt m1_cdplot.ps 3.0
 Multiple chain/model RMS distance plot
 Reference chain : (A)
 Residue range :    15 -   330
 Central atom type : ( CA)
 Max dist on grey-scale (A) : (   3.000)
 Cut-off for printing   (A) : (   1.000)
 => XPS_GRAF - GJK (19981216/3.1.2)
 Opened PostScript file : (m1_cdplot.ps)
 Date    : (Fri Oct 12 18:28:12 2001)
 User    : (gerard)
 Program : (LSQMAN)
 GLY   103 |     1.13 A |   8 28
 GLU   104 |     1.41 A |   8 28
 THR   105 |     1.33 A |   8 28
 GLY   219 |     2.51 A |   8 28
 GLU   220 |     2.85 A |   8 28
 THR   330 |     3.27 A |   8 28
 Point size : (  14.000)
 Max observed distance (A) : (   5.956)
 NOTE: DMAX_BLACK = 3.000 A is smaller than max observed distance = 5.956 A !
 PostScript file written
 Nr of residues found : (        316)
 Nr of residues shown : (          6)
 Multi-RMS dist (A) Ave, Sdv, Min, Max :     0.36    0.28    0.17    3.27
 Plot file written
 CPU total/user/sys :       2.4       2.4       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

19.9 VMain_chain (phi, psi circular variance plot for multiple models)

This command will plot the circular variance of phi and psi for a set of models (NCS or NMR). The circular variance of N observations of a dihedral angle is defined as:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  CV = 1.0 - (1/N)*SQRT ( (SUM cos)**2 + (SUM sin)**2 )
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Hence, the circular variance lies between 0 and 1, with low values indicating a high degree of similarity between the various observations of the angle.

NMR-ists sometimes like to work with order parameters. These are defined as:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
  OP = (1/N)*SQRT ( (SUM cos)**2 + (SUM sin)**2 )
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

In other words: CV = 1 - OP.

Example of a VMain plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vm m1 a q3.plt 0.01
 Command > (vm m1 a q3.plt 0.01)
 Main-chain circular variance plot
 Reference chain : (A)
 Residue range :     1 -   764
 Cut-off for printing : (   0.010)
   Residue     <phi> S(phi)    Min    Max     <psi> S(psi)    Min    Max    #phi #psi    V(phi) V(psi)
 SER    99 |  -126.4    7.7 -134.0 -118.7 |  -121.4    9.3 -130.7 -112.1 |     2    2 |   0.009  0.013
 ALA   100 |   -78.2    9.7  -87.9  -68.6 |   -20.4    5.7  -26.1  -14.7 |     2    2 |   0.014  0.005
 GLY   434 |   -94.1   11.1 -105.3  -83.0 |     0.0    0.0    0.0    0.0 |     2    0 |   0.019 -1.000
 Nr of residues found : (        434)
 Nr of residues shown : (          3)
 SIGMA(phi) Ave, Sdv, Min, Max, # :     1.19    1.17    0.00   11.13     433
 RANGE(phi) Ave, Sdv, Min, Max, # :     2.37    2.34    0.00   22.25     433
 SIGMA(psi) Ave, Sdv, Min, Max, # :     1.13    1.04    0.00    9.32     433
 RANGE(psi) Ave, Sdv, Min, Max, # :     2.26    2.09    0.00   18.64     433
 CV(phi)    Ave, Sdv, Min, Max, # :    0.000   0.001   0.000   0.019     433
 CV(psi)    Ave, Sdv, Min, Max, # :    0.000   0.001   0.000   0.013     433
 Plot file written
 CPU total/user/sys :       2.4       2.4       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Dihedrals for which there are no observations are listed as having a circular variance of -1.0 (to recognise them easily), but in the plot they will have values of 0.0. The plot file may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by LSQMAN V. 970722/5.5 at Tue Jul 22 19:22:45 1997 for user gerard
XLABEL Residue number
YLABEL CV(phi) and CV(psi)
REMARK Plot of CV(phi) (solid blue) and CV(psi) (dotted red) as a function of residue nr
REMARK Values are for mol M1 = file /nfs/pdb/full/1ldn.pdb
XYVIEW    14.00  331.00    0.00    1.00
NPOINT      316
COLOUR        4
XVALUE *
    15    16    17    18    19    20    21    22    23    24    25    26
...
   327   328   329   330
REMARK SIGMA(phi) ave, sdv, min, max     9.02    5.12    2.92   58.45
REMARK RANGE(phi) ave, sdv, min, max    28.52   17.95    9.94  189.73
REMARK CV(phi) ave, sdv, min, max     0.02    0.03    0.00    0.28
! MOLNAM_RESIDUE_CVPHI R 316 (9f8.3)
YVALUE *
   0.000   0.019   0.014   0.019   0.016   0.027   0.023   0.007   0.009
   0.007   0.006   0.006   0.014   0.013   0.010   0.007   0.006   0.014
...
   0.011   0.007   0.009   0.015   0.013   0.020   0.016   0.103   0.010
   0.241
MORE
COLOUR        1
REMARK SIGMA(psi) ave, sdv, min, max     8.91    6.06    2.54   73.71
REMARK RANGE(psi) ave, sdv, min, max    28.02   18.49    6.98  192.70
REMARK CV(psi) ave, sdv, min, max     0.02    0.04    0.00    0.50
! MOLNAM_RESIDUE_CVPSI R 316 (9f8.3)
YVALUE *
   0.015   0.011   0.010   0.025   0.011   0.020   0.012   0.009   0.007
   0.008   0.005   0.009   0.017   0.006   0.008   0.004   0.003   0.023
   0.011   0.006   0.007   0.008   0.008   0.010   0.007   0.007   0.011
...
   0.005   0.022   0.021   0.004   0.017   0.030   0.036   0.031   0.497
   0.000
END
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Convert plot files to PostScript with O2D.

19.10 VSide_chain (chi1, chi2 circular variance plot for multiple models)

This command will plot the circular variance of chi1 and chi2 for a set of models (NCS or NMR). See the VM command for further information.

Example of a VSide plot.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vs m1 a q6.plt 0.1
 Command > (vs m1 a q6.plt 0.1)
 Side-chain circular variance plot
 Reference chain : (A)
 Residue range :     1 -   764
 Cut-off for printing : (   0.100)
   Residue      <c1>  S(c1)    Min    Max      <c2>  S(c2)    Min    Max     #c1  #c2     V(c1)  V(c2)
 ASP    63 |    59.1    1.3   57.7   60.4 |    -0.9   88.6  -89.5   87.7 |     2    2 |   0.000  0.976
 PHE   146 |   -76.2    0.5  -76.7  -75.7 |     0.7   89.0  -88.4   89.7 |     2    2 |   0.000  0.983
 TYR   167 |   -69.6    0.2  -69.7  -69.4 |    -1.9   86.1  -88.0   84.3 |     2    2 |   0.000  0.933
 TYR   274 |   178.2    0.1  178.1  178.3 |   179.7   88.6   91.1  268.3 |     2    2 |   0.000  0.975
 PRO   276 |    12.3   19.5   -7.2   31.8 |    -3.3   32.3  -35.6   29.0 |     2    2 |   0.057  0.155
 PHE   280 |   -72.5    0.7  -73.2  -71.8 |     0.5   89.9  -89.4   90.4 |     2    2 |   0.000  0.999
 TYR   321 |   172.9    0.3  172.6  173.2 |    -1.4   86.7  -88.2   85.3 |     2    2 |   0.000  0.943
 Nr of residues found : (        434)
 Nr of residues shown : (          7)
 SIGMA(chi1) Ave, Sdv, Min, Max, # :     1.34    1.66    0.01   19.52     356
 RANGE(chi1) Ave, Sdv, Min, Max, # :     2.68    3.31    0.02   39.04     356
 SIGMA(chi2) Ave, Sdv, Min, Max, # :     4.53   14.50    0.01   89.91     218
 RANGE(chi2) Ave, Sdv, Min, Max, # :     9.02   28.94    0.00  179.83     219
 CV(chi1)    Ave, Sdv, Min, Max, # :    0.001   0.003   0.000   0.057     356
 CV(chi2)    Ave, Sdv, Min, Max, # :    0.029   0.158   0.000   0.999     219
 Plot file written
 CPU total/user/sys :       4.7       4.7       0.0
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

20 VRML COMMANDS

From version 6.0 on, LSQMAN can produce VRML files of molecules in memory (VRML = Virtual Reality Modelling Language). Rather than writing a dedicated set of routines to display molecules, use of of VRML is trivial for the programmer, and easy for the user.

Some things you may need to know:
- VRML files have the extension ".wrl"
- use your favourite browser with VRML viewer plug-in to inspect the displays (you can launch it from inside LSQMAN, e.g.: "$ netscape test.wrl &")
- colours can be defined by name or by RGB values (red-green-blue, three numbers in the range 0-1). The VRml COlour_list command will list all (> 400) predefined colour names and their RGB values

A typical series of commands would be:
- read mol1 ..
- read mol2 ..
- superimpose mol2 onto mol1 (EX, BR)
- apply the operator to mol2
- VRml SEtup (if necessary)
- VRml INit filename
- VRml ADd mol1 [chain]
- VRml ADd mol2 [chain]

The VRML interface was written for two purposes:
- quick inspection of superimposed structures without the need to write the structures to new PDB files, fire up a separate graphics program, reading the molecules and displaying them; for this purpose you probably want to re-use the same file over and over (just hit the RELOAD button of your VRML browser when you have written new molecules to the file)
- creating files with superimposed molecules which you can include in your web pages (in this case, don't forget to compress the files with the "gzip" command to reduce their size !)

The options have been kept simple and fast. For fancier pictures you should use dedicated software (e.g., all-atom ball-and-stick models).

20.1 VRml SEtup (define some parameters)

With this command you can define the following parameters:

- the central atom type (" CA " for proteins)
- the maximum allowed distance between two subsequent central atoms for them to be connected on the display (4.5 Å is a reasonable cut-off for CA-CA distances in proteins)
- the background colour (default is black)
- the default colour for molecules (default is white)

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vr set " CA " 4.5 white "0.35 0.87 1.0"
 Central atom type : ( CA)
 Max central atom distance : (   4.500)
 Background colour : (1.000000 1.000000 1.000000)
 Default colour : (0.3500000 0.8700000 1.000000)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vr se
 Central atom type ? ( CA)
 Central atom type : ( CA)
 Max central atom distance ? (4.5)
 Max central atom distance : (   4.500)
 Background colour ? (1.000000 1.000000 1.000000) grey
 Background colour : (0.5000000 0.5000000 0.5000000)
 Default colour ? (0.3500000 0.8700000 1.000000) red
 Default colour : (1.000000 0.0000000E+00 0.0000000E+00)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

20.2 VRml INit (open a new VRML file)

This command opens a new VRML file (default: same file name as before if the file name is not provided). To actually write molecules to it, use the VRml ADd command

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vr in test.wrl
 Open VRML file : (test.wrl)
 Opened VRML file
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

20.3 VRml COlour_list (list predefined colour names)

To help you find colours, more than 400 colour names have been predefined. This command will list their names and their RGB values.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vr co
 Nr of colours : (        411)
 #   1 (black                ) =     12595212 RGB    0.000   0.000   0.000
 #   2 (red                  ) =     12596212 RGB    1.000   0.000   0.000
 #   3 (green                ) =     13619212 RGB    0.000   1.000   0.000
 #   4 (blue                 ) =   1061171212 RGB    0.000   0.000   1.000
 #   5 (yellow               ) =     13620212 RGB    1.000   1.000   0.000
 #   6 (magenta              ) =   1061172212 RGB    1.000   0.000   1.000
 #   7 (cyan                 ) =   1062195212 RGB    0.000   1.000   1.000
 #   8 (light_grey           ) =    852276012 RGB    0.800   0.800   0.800
 #   9 (grey                 ) =    537395712 RGB    0.500   0.500   0.500
 #  10 (dark_grey            ) =    222515412 RGB    0.200   0.200   0.200
 #  11 (white                ) =   1062196212 RGB    1.000   1.000   1.000
 #  12 (gainsboro            ) =    917351274 RGB    0.862   0.862   0.862
 #  13 (honeydew             ) =   1000330169 RGB    0.941   1.000   0.941
 #  14 (mistyrose            ) =    938355700 RGB    1.000   0.894   0.882
 ...
 # 407 (dodgerblue2          ) =    991454330 RGB    0.110   0.525   0.933
 # 408 (lightsteelblue3      ) =    855328391 RGB    0.635   0.709   0.803
 # 409 (green3               ) =     13417484 RGB    0.000   0.803   0.000
 # 410 (orangered4           ) =     12745261 RGB    0.545   0.146   0.000
 # 411 (mediumorchid1        ) =   1061582714 RGB    0.878   0.401   1.000
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

20.4 VRml ADd (add a molecule to the current VRML file)

Add a trace of the central atoms of a molecule or chain to the current VRML file. You can also provide the colour to use. If you don't provide a chain identifier (e.g., A, B, ...), all chains in the molecule will be drawn (chain = "*").

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/1azu.pdb
 ...
 LSQMAN > re m2 /nfs/pdb/full/2aza.pdb
 ...
 LSQMAN > br m1 a m2 a 50 25 100
 Brute-force fit of M1 A
 And                M2 A
 ...
 LSQMAN > im m1 * m2 *
 Improve fit of  M1 *
 And             M2 *
 ...
 LSQMAN > ap m1 m2
 Bring Mol 2 on top of Mol 1 ...
 Molecule 1 : (M1)
 Molecule 2 : (M2)
 Apply to mol 2 chain : (*)
 Nr of atoms moved : (       2263)
 Resetting ALL operators of mol 2 ...
 LSQMAN > vr in test.wrl
 Closed VRML file
 Open VRML file : (test.wrl)
 Opened VRML file
 LSQMAN > vr ad m1 a "0.705   0.322   0.803"
 VRML - Add mol M1         chain A
 Nr of central atoms written : (        126)
 LSQMAN > vr ad m2 a cornsilk3
 VRML - Add mol M2         chain A
 Nr of central atoms written : (        129)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

20.5 VRml ALl_chains (add each chain/model of a molecule to the current VRML file)

Add a trace of the central atoms of each chain/model of a molecule to the current VRML file. Each chain/model will be drawn in a different colour.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > vr in
 Open VRML file : (lsqman.wrl)
 Opened VRML file
 LSQMAN > vr all m1
 VRML traces for mol : (M1)
 VRML - Add chain A colour yellow
 Nr of central atoms written : (        106)
 VRML - Add chain B colour green
...
 VRML - Add chain Y colour mint_cream
 Nr of central atoms written : (        106)
 VRML - Add chain Z colour bisque
 Nr of central atoms written : (        106)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21 EXAMPLE

The following illustrates a simple LSQMAN session in which you want to superimpose 1AZU on top of 2AZA, improve the operator, create an O macro and execute this macro in O:

21.1 (1) read the molecules

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > read 2aza 2aza.pdb
 Nr of lines read from file : (       1211)
 Nr of atoms in molecule    : (        987)
 Nr of chains or models     : (          1)
 LSQMAN > read 1azu 1azu.pdb
 Nr of lines read from file : (       1041)
 Nr of atoms in molecule    : (        931)
 Nr of chains or models     : (          1)
 LSQMAN > li *
   
 List    : (2AZA)
 File    : (2aza.pdb)
 Comment : (Read from 2aza.pdb)
 Nr of atoms in mol  : (        987)
 Multiple NMR models ? (F)
 Nr of chains/models : (          1)
   
 List    : (1AZU)
 File    : (1azu.pdb)
 Comment : (Read from 1azu.pdb)
 Nr of atoms in mol  : (        931)
 Multiple NMR models ? (F)
 Nr of chains/models : (          1)
 LSQMAN > sh 2aza 1azu
 Operator bringing : (1AZU)
 on top of         : (2AZA)
 Last command was  : (none)
 The      0 atoms have an RMS distance of 1000.000 A
 SI = RMS * Nmin / Nmatch             =   1000.00000
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =   1000.00000
 Rotation     :   1.00000000  0.00000000  0.00000000
                  0.00000000  1.00000000  0.00000000
                  0.00000000  0.00000000  1.00000000
 Translation  :       0.0000      0.0000      0.0000
 Determinant of rotation matrix =   1.000000
 Rotation angle                 =   0.000000
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21.2 (2) do the initial, explicit superposition

for example using an alignment obtained from DEJAVU

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > atom_types define " ca " " n  " " c  " " co " " cb " " cg " " cd "
 Nr of atom types : (       7)
 Type : ( CA)
 Type : ( N)
 Type : ( C)
 Type : ( CO)
 Type : ( CB)
 Type : ( CG)
 Type : ( CD)
 LSQMAN > ex 2aza
 Range 1 ? (A1-10) "a4-10 a19-23 a28-36 a44-51 a53-66 a91-97 a106-111 a123:126"
 Mol 2 ? (2AZA) 1azu
 Range 2 ? (A1) "a4 a19 a28 a44 a53 a91 a106 a123"
 Explicit fit of 2AZA "A4-10 A19-23 A28-36 A44-51 A53-66 A91-97 A106-111 A123:126"
 And             1AZU "A4 A19 A28 A44 A53 A91 A106 A123"
 Atom types     | CA | N  | C  | CO | CB | CG | CD |
 Nr of atoms to match : (        265)
 The    265 atoms have an RMS distance of    0.893 A
 Rotation    :  -0.956401  0.130562 -0.261248
                 0.169145 -0.481603 -0.859912
                -0.238090 -0.866610  0.438522
 Translation :     13.816    26.765    38.539
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21.3 (3) play with the improve option until you're happy

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set dist 4
 Max matching distance (A) : (   4.000)
 LSQMAN > set decay 0.95
 Decay factor : (   0.950)
 LSQMAN > set min 10
 Min fragment length (res) : (      10)
 LSQMAN > set frag 0
 Fragment length decay (res) : (       0)
 LSQMAN > set max 25
 Max nr of improve cycles : (      25)
 LSQMAN > set opt mi
 Criterion : (MI)
 LSQMAN > set seq on
 Sequential hits : (ON)
 LSQMAN > set rms 0.5
 RMS weight : (   0.500)
 LSQMAN > im 2aza a* 1azu a*
 Improve fit of  2AZA A*
 And             1AZU A*
 Atom type      | CA |
 Nr of atoms in mol1 : (        129)
 Nr of atoms in mol2 : (        126)
   
 Found fragment of length : (     126)
   
 Cycle : (          1)
 Distance cut-off (A)      : (   4.000)
 Min fragment length (res) : (      10)
 The    126 atoms have an RMS distance of    0.906 A
 SI = RMS * Nmin / Nmatch             =      0.90630
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.68816
 Rotation     :  -0.95771867  0.11354620 -0.26435241
                  0.17979124 -0.48112354 -0.85801822
                 -0.22461088 -0.86926830  0.44036639
 Translation  :      13.5455     27.2343     38.5374
   
...
   
 Found fragment of length : (     102)
 Found fragment of length : (      23)
   
 Cycle : (         13)
 Distance cut-off (A)      : (   2.161)
 Min fragment length (res) : (      10)
 The    125 atoms have an RMS distance of    0.880 A
 SI = RMS * Nmin / Nmatch             =      0.88718
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.68894
 Rotation     :  -0.95701498  0.11345596 -0.26692718
                  0.18237905 -0.48019871 -0.85799015
                 -0.22552218 -0.86979133  0.43886536
 Translation  :      13.4690     27.2135     38.5830
   
 Found fragment of length : (      34)
 Found fragment of length : (      67)
 Found fragment of length : (      23)
   
 Cycle : (         14)
 Distance cut-off (A)      : (   2.053)
 Min fragment length (res) : (      10)
 The    124 atoms have an RMS distance of    0.862 A
 SI = RMS * Nmin / Nmatch             =      0.87625
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.68773
 Rotation     :  -0.95704645  0.11349409 -0.26679796
                  0.18196884 -0.48126674 -0.85747868
                 -0.22571975 -0.86919582  0.43994224
 Translation  :      13.4796     27.2430     38.5840
   
 Fit deteriorated in this cycle !
 Alignment based on previous operator !
   
 Fragment CYS-A   3 <===> CYS-A   3 @     2.00 A
          GLU-A   4 <===> SER-A   4 @     1.15 A
          ALA-A   5 <===> VAL-A   5 @     0.64 A
...
          PRO-A 104 <===> GLU-A 104 @     1.13 A
 Fragment GLU-A 106 <===> GLU-A 106 @     1.80 A
          ALA-A 107 <===> GLN-A 107 @     1.82 A
...
          SER-A 128 <===> LYS-A 128 @     1.76 A
   
 Nr of residues in mol1   : (     129)
 Nr of residues in mol2   : (     126)
 Nr of matched residues   : (     125)
 Nr of identical residues : (      77)
 % identical of matched   : (  61.600)
 % matched   of mol1      : (  96.899)
 % identical of mol1      : (  59.690)
 % matched   of mol2      : (  99.206)
 % identical of mol2      : (  61.111)
   
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21.4 (4) save the operator (just in case), create an O macro file

(and quit or go to another window to run O)

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- LSQMAN > sav 2aza 1azu Operator bringing : (1AZU) on top of : (2AZA) File name ? (rt_1azu_to_2aza.odb) Save in file : (rt_1azu_to_2aza.odb) Datablock name : (.lsq_rt_1azu_to_2aza) LSQMAN > omac init 2aza File name ? (lsq_2aza.omac) O macro initialised LSQMAN > omac appe 1azu O macro extended LSQMAN > omac close O macro file closed LSQMAN > quit *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** Version - 931022/0.4 Started - Fri Oct 22 23:28:21 1993 Stopped - Fri Oct 22 23:36:37 1993 CPU-time taken : User - 2.9 Sys - 0.6 Total - 3.5 *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** >>> This program is (C) 1993, GJ Kleywegt & TA Jones <<< E-mail: "gerard@xray.bmc.uu.se" or "alwyn@xray.bmc.uu.se" *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN *** LSQMAN ***

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21.5 (5) the O macro

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
! O macro lsq_2aza.omac
! Created by LSQMAN V. 931022/0.4 at Fri Oct 22 23:36:05 1993 for user gerard
!
o_setup off off on
!
print ... Analysing 2AZA
print ... From file 2aza.pdb
!
sam_at_in 2aza.pdb 2AZA
mol 2AZA obj c2AZA ca ; end_obj
!
paint_colour red
!
!
print ==========================================
!
print ... Comparing 1AZU
print ... From file 1azu.pdb
!
print ... Nr of matched residues          125
print ... RMS distance of these       0.88014
print ... Similarity index            0.88718
print ... Match index                 0.68894
!
sam_at_in 1azu.pdb 1AZU
!
db_create .lsq_rt_1AZU_to_2AZA 12 R
!
db_set_data .lsq_rt_1AZU_to_2AZA  1  1    -0.95701498
db_set_data .lsq_rt_1AZU_to_2AZA  2  2     0.11345596
db_set_data .lsq_rt_1AZU_to_2AZA  3  3    -0.26692718
db_set_data .lsq_rt_1AZU_to_2AZA  4  4     0.18237905
db_set_data .lsq_rt_1AZU_to_2AZA  5  5    -0.48019871
db_set_data .lsq_rt_1AZU_to_2AZA  6  6    -0.85799015
db_set_data .lsq_rt_1AZU_to_2AZA  7  7    -0.22552218
db_set_data .lsq_rt_1AZU_to_2AZA  8  8    -0.86979133
db_set_data .lsq_rt_1AZU_to_2AZA  9  9     0.43886536
db_set_data .lsq_rt_1AZU_to_2AZA 10 10    13.46904564
db_set_data .lsq_rt_1AZU_to_2AZA 11 11    27.21351051
db_set_data .lsq_rt_1AZU_to_2AZA 12 12    38.58302689
!
lsq_mol 1AZU_to_2AZA 1AZU ;
mol 1AZU obj c1AZU ca ; end_obj
paint_object c1AZU
!
! del_obj c1AZU ;
! db_kill *1AZU*
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21.6 (6) run O and execute the macro

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 % 428 gerard rigel 19:30:13 progs/lsq > ono general.o
   
... Run 4d_ono
   
... Executing /nfs/taj/alwyn/o/bin/4d_ono general.o
... For gerard on rigel at Fri Oct 22 23:38:31 MDT 1993
   
  O > Use of this program implies acceptance of conditions
  O > described in Appendix 1 of the O manual
  O > O version 5.9.1 , Tue Aug 10 18:33:27 MET DST
  O > Loading general.o
  O > Maximum inter-residue link distance = 2.00
  O >  There were   23 residues.
  O >              175 atoms.
  O > Do you want to use the display? [Yes]:
  O > Graphics board GL4DXG-4.0
  O >   O > Trackball on (F7KEY)
  O > @lsq_2aza.omac
  O > Macro in computer file-system.
  O >   O >  As4> ... Analysing 2AZA
  O >  As4> ... From file 2aza.pdb
  O >   O >  Sam> File type is PDB
 Sam>  Database compressed.
 Sam> Space for     47589 atoms
 Sam> Space for     10000 residues
 Sam> Molecule 2AZA contained 129 residues and 987 atoms
  O >   O >   O >   O >   O >   O >
 As4> ==========================================
  O >   O >  As4> ... Comparing 1AZU
  O >  As4> ... From file 1azu.pdb
  O >   O >  As4> ... Nr of matched residues          125
  O >  As4> ... RMS distance of these       0.88014
  O >  As4> ... Similarity index            0.88718
  O >  As4> ... Match index                 0.68894
  O >   O >  Sam> File type is PDB
 Sam>  Database compressed.
 Sam> Space for     46151 atoms
 Sam> Space for     10000 residues
 Sam> Molecule 1AZU contained 127 residues and 931 atoms
  O >   O >   O >   O >   O >   O >   O >   O >   O >   O >
  O >   O >   O >   O >   O >   O >   O >   O >   O >  Paint> C1AZU
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

21.7 (7) centre on one of the atoms and admire a beautiful fit !

22 IMPROVING OPERATORS

22.1 differences with Lsq_Improve in O

This section describes some of the niceties of the improve option in LSQMAN.
With the default settings, this option functions in a way that is similar to the LSQ_IMPROVE command in O. However, apart from being considerably faster, the performance of the algorithm can be further improved by playing around with its parameters. Some of the "extras" compared to O:

* you may use residues from just ONE chain without knowing the names of the first and last residues

* you may impose "sequentiality", i.e., if residue A12 in molecule 1 was matched with A34 in molecule 2, then residue A88 in molecule 1 may only be matched with a residue in molecule 2 which is > A34

* residues in the second molecule are NEVER used twice

* a choice of optimisation criteria (which also give a better control over the convergence behaviour of the algorithm)

* optional use of decaying parameters

* informative output regarding, for instance, the percentage of matched and identical residues in both molecules

22.2 (1) defining which atoms to use

- the FIRST of the selected ATOM_TYPES is used for all selected residues (i.e., make sure not to use ALl or NOnh atoms !)
- one or more ranges of residues can be defined, e.g.: "A1-45 A60:132" - defines two ranges of residues A3-130 - defines a single range of residues A* - defines all residues in chain or NMR model A C - defines all residues in chain or NMR model C (new in version 9.3) * - defines all residues in all chains or NMR models

22.3 (2) sequentiality constraint

this can be switched ON and OFf with the SEt SEq option; you would use this constraint if you are comparing two very similar molecules, for example; if you have two very different molecules which just happen to share a common motif, you could switch this constraint OFf

22.4 (3) no double use of residues

in O, sometimes one residue in "the second molecule" is matched to two different residues in the first; this is confusing and screws up your statistics; therefore, in LSQMAN, there is a forced one-to-one correspondence

22.5 (4) optimisation criteria

the following criteria may be used to decide if an operator has really improved or not:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- - NM - the number of matched residues (not normally used) - RM - the RMS distance of the matched atoms (not normally used) - SI - the similarity index; this is calculated as follows: RMSD * min(N1,N2) SI = --------------------- Nmatched where: N1,2 = nr of selected residues in molecule 1,2 Nmat = nr of matched residues RMSD = their RMS distance SI assumes values >= 0.0; the lower the value of SI, the better the fit and the more similar the two molecules are - MI - the match index; this is calculated as follows: (1 + Nmatched) MI = -------------------------------------- (1 + W * RMSD ) * (1 + min(N1,N2)) W is weight factor (SEt RMs_weight) which is > 0; the larger the weight, the bigger the influence of the RMSD on the value of MI (suggested values for W are between 0.1 and 1)

MI assumes values between 0 and 1, where "0" indicates a "perfect mis-match" and "1" a perfect match ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

As of version 8.5, two more dimensionless criteria can be used:

- the Maiorov-Crippen "rho" value (not the scaled one) as defined in Proteins 22, pp. 273-283 (note: equation (16), the definition of rho, contains an error: R^2(B) should be 2*R^2(B)).

- the relative RMSD as defined by MR Betancourt & J Skolnick (Biopolymers 59, pp. 305-309 (2001)). Identical structures have an RRMSD of zero; a value around one means that two structures are as different as two random proteins of the same sizes.

More criteria have been added since:

- the normalised RMSD (100) as defined by O Carugo & S Pongor, Protein Sci 10, 1470-3 (2001), i.e. the RMSD normalised to an alignment length of 100 residues.

- the geometric measures SAS(n) (where n=1, 2, 3 or 4) as defined by Kolodny et al., J Mol Biol XXX, ??-?? (2005). SAS(n) is defined as: RMSD * (100/Nmatch)**n. Using this with n=4 will force the program to align more residues, with n=1 to get lower RMSD values.

22.6 (5) decaying parameters

the two parameters used in O are also employed here:

- SEt DIst = define the maximum distance between two atoms in order for them to be considered matched (default 3.8 A)

- SEt MIni = the minimum length of contiguous matching fragments of residues (default 5 residues)

in addition, however, both parameters may be allowed to "decay", i.e. their value may be changed slightly in every iteration:

- SEt DEcay = a factor by which the maximum distance is MULTIPLIED after every cycle (suggested values in the range 0.9 to 1.1)

- SEt FRagm = a value which is ADDED to the minimum fragment length after every cycle

hence, for example, you may start with a fairly small set of closely matched atoms and allow the distance to become larger in every cycle provided that this leads to larger fragments:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > set dis 3.5
 Max matching distance (A) : (   3.500)
 LSQMAN > set dec 1.1
 Decay factor : (   1.100)
 LSQMAN > set min 4
 Min fragment length (res) : (       4)
 LSQMAN > set fra 1
 Fragment length decay (res) : (       1)
 LSQMAN > se opt mi
 Criterion : (MI)
 LSQMAN > im lipa a* 1ace a*
 Improve fit of  LIPA A*
 And             1ACE A*
 Atom type      | CA |
 Nr of atoms in mol1 : (        317)
 Nr of atoms in mol2 : (        526)
...
 Cycle : (          1)
 Distance cut-off (A)      : (   3.500)
 Min fragment length (res) : (       4)
 The     85 atoms have an RMS distance of    1.959 A
 SI = RMS * Nmin / Nmatch             =      7.30601
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.09140
 Rotation     :  -0.77263975 -0.28095514 -0.56929082
                  0.05460826  0.86400366 -0.50051534
                  0.63249171 -0.41780603 -0.65222114
 Translation  :     -24.2804    -17.6301    105.6828
...
 Cycle : (          2)
 Distance cut-off (A)      : (   3.850)
 Min fragment length (res) : (       5)
 The    135 atoms have an RMS distance of    2.061 A
 SI = RMS * Nmin / Nmatch             =      4.83898
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.13973
 Rotation     :  -0.76273268 -0.28683752 -0.57962328
                  0.05579499  0.86373103 -0.50085485
                  0.64430255 -0.41435844 -0.64279181
 Translation  :     -25.1862    -17.7377    105.2635
...
 Cycle : (          3)
 Distance cut-off (A)      : (   4.235)
 Min fragment length (res) : (       6)
 The    129 atoms have an RMS distance of    2.043 A
 SI = RMS * Nmin / Nmatch             =      5.01958
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.13436
 Rotation     :  -0.75659031 -0.31367165 -0.57374316
                  0.03643082  0.85584819 -0.51594251
                  0.65287358 -0.41125903 -0.63609910
 Translation  :     -24.4675    -17.0481    105.8078
   
 Fit deteriorated in this cycle !
 Alignment based on previous operator !
...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

22.7 (6) informative output

at the end of an optimisation run, a list of matched residues is printed, together with information concerning the number and percentage of matched and identical residues in both molecules:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
...
 Fragment PRO-A  33 <===> VAL-A 111 @     3.09 A
          ILE-A  34 <===> MET-A 112 @     1.84 A
          LEU-A  35 <===> VAL-A 113 @     0.75 A
          LEU-A  36 <===> TRP-A 114 @     0.65 A
          VAL-A  37 <===> ILE-A 115 @     0.27 A
          PRO-A  38 <===> TYR-A 116 @     0.60 A
          GLY-A  39 <===> GLY-A 117 @     0.81 A
          THR-A  40 <===> GLY-A 118 @     1.03 A
 Fragment GLY-A  41 <===> SER-A 122 @     1.58 A
          THR-A  42 <===> GLY-A 123 @     1.54 A
          THR-A  43 <===> SER-A 124 @     0.95 A
          GLY-A  44 <===> SER-A 125 @     1.04 A
          PRO-A  45 <===> THR-A 126 @     1.71 A
          GLN-A  46 <===> LEU-A 127 @     2.01 A
...
 Nr of residues in mol1   : (     317)
 Nr of residues in mol2   : (     526)
 Nr of matched residues   : (     135)
 Nr of identical residues : (      11)
 % identical of matched   : (   8.148)
 % matched   of mol1      : (  42.587)
 % identical of mol1      : (   3.470)
 % matched   of mol2      : (  25.665)
 % identical of mol2      : (   2.091)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

23 IMPROVING ROUGH DEJAVU ALIGNMENTS

Gerard's recipe for improving rough DEJAVU alignments, tested using CBHI (1CEL) and 1LTE:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- chain_mode original atom_types extended_main_chain set reset ! read cbh1 /nfs/taj/gerard/progs/secs/cbh1/a16.pdb omacro init cbh1

! read 1lte /nfs/pdb/pre/1lte.pdb ! explicit cbh1 a118-122 a140-143 a287-295 a299-306 a359-365 a414-423 1lte 31 55 165 178 207 225 show cbh1 1lte ! set coarse imp cbh1 * 1lte * show cbh1 1lte ! set intermediate imp cbh1 * 1lte * show cbh1 1lte ! set fine imp cbh1 * 1lte * show cbh1 1lte ! omacro append 1lte ! omacro close ! quit ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The initial, explicit fit (using 43 residues !) gives:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 The    212 atoms have an RMS distance of   10.609 A
 SI = RMS * Nmin / Nmatch             =     10.60882
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.08614
 Rotation     :  -0.69291258 -0.10764210 -0.71294129
                  0.03076268 -0.99230647  0.11992305
                 -0.72036505  0.06116420  0.69089299
 Translation  :      49.1372     64.3310     50.8504
 Determinant of rotation matrix =   1.000000
 Rotation angle                 = 175.683167
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

After coarse improvement (note the big changes in the operator !):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 The    159 atoms have an RMS distance of    3.199 A
 SI = RMS * Nmin / Nmatch             =      4.80841
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.25646
 Rotation     :  -0.57405788 -0.15695305 -0.80363131
                  0.22625503 -0.97364998  0.02853779
                 -0.78693473 -0.16544329  0.59444284
 Translation  :      52.3044     70.6993     52.1137
 Determinant of rotation matrix =   1.000000
 Rotation angle                 = 167.589386
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

After intermediate improvement:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 The    108 atoms have an RMS distance of    2.106 A
 SI = RMS * Nmin / Nmatch             =      4.66147
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.22120
 Rotation     :  -0.57900333 -0.11083046 -0.80775720
                  0.21638666 -0.97607797 -0.02118141
                 -0.78608650 -0.18705200  0.58913463
 Translation  :      52.4746     71.2050     51.4827
 Determinant of rotation matrix =   1.000000
 Rotation angle                 = 169.411850
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

After fine-tuning:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 The     76 atoms have an RMS distance of    1.743 A
 SI = RMS * Nmin / Nmatch             =      5.48040
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.17144
 Rotation     :  -0.57834578 -0.10955796 -0.80840164
                  0.19439344 -0.98090446 -0.00613647
                 -0.79229248 -0.16069698  0.58859926
 Translation  :      52.5953     70.5723     51.8284
 Determinant of rotation matrix =   1.000000
 Rotation angle                 = 170.172287
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

NOTE: with the BRute_force command (from version 5.0 onwards) you can get the same result without knowing anything beforehand about which residues align in the two molecules.

24 LSQMAN AND MACROMOLECULES OTHER THAN PROTEINS

Since the atom types to be used for alignment can be defined by the user, LSQMAN can easily be used for other types of macromolecule, such as RNA, DNA, and oligosaccharides. The exceptions are commands which are specific for proteins, such as MRama etc.

The following example shows how to analyse the NCS relation between two DNA molecules (PDB entry 130D) - in this case the phosphor atoms are used as "central atoms" (i.e., analogous to CA in proteins):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > re m1 /nfs/pdb/full/130d.pdb
 ...
 LSQMAN > at de " P  "
 Nr of atom types : (       1)
 Type : ( P)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Now use the BRute_force command to align the two chains:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > br m1 a m1 b 5 3 5
 WARNING - mol1 == mol2 !
 Brute-force fit of M1 A
 And                M1 B
 Atom types     | P  |
 B-factor range used  -1000.00 - 10000.00 A2
 Fragment length             5
 Fragment step size          3
 Min matched residues        5
 Mol 1 zone to try : (A1-12)
 Mol 2 zone to try : (B13-24)
   
 Try zone : (A1-5)
 Max match so far : (         10)
 RMSD (A)         : (   0.938)
   
 Max match : (         10)
 RMSD (A)  : (   0.938)
 Mol 1 res : (          1)
 Mol 2 res : (         13)
   
 Regenerating best alignment ...
 The     10 atoms have an RMS distance of    0.938 A
 SI = RMS * Nmin / Nmatch             =      1.03201
 MI = (1+Nmatch)/{(1+W*RMS)*(1+Nmin)} =      0.47295
 MC = Maiorov-Crippen RHO (0-2)       =      0.07081
 RMS delta B for matched atoms        =     3.246 A2
 Corr. coefficient matched atom Bs    =       -0.801
 Rotation     :   0.75101489 -0.66026980 -0.00451462
                 -0.65996760 -0.75084811  0.02588229
                 -0.02047909 -0.01645848 -0.99965483
 Translation  :      18.1981     48.0588     16.0850
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Plot the distances between corresponding P atoms in the two chains:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > di m1 a1-12 m1 b13 m1_m1_di.plt
 WARNING - mol1 == mol2 !
 Central-atom distance plot
 Central atom type : ( P)
 Plot of M1 A1-12
 And     M1 B13
 Nr of residues matched : (         11)
 Average distance : (   1.103)
 Minimum distance : (   0.550)
 Maximum distance : (   2.981)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Make a plot of the differences between the pseudo-angles and torsions defined by subsequent P atoms in the two chains:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 LSQMAN > dd m1 a1-12 m1 b13 m1_m1_dd.plt
 WARNING - mol1 == mol2 !
 Central-atom delta-dihedral plot
 Central atom type : ( P)
 Plot of M1 A1-12
 And     M1 B13
 Nr of residues matched : (          8)
 RMS delta DIH       : (  34.455)
 Average |delta DIH| : (  29.385)
 Nr |delta DIH| > 10 : (       7)
 Percentage          : (  87.500)
 RMS |delta ANG|     : (   9.706)
 Average |delta ANG| : (   7.767)
 Nr |delta ANG| > 5  : (       6)
 Percentage          : (  75.000)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

25 KNOWN BUGS

(1) With atom types All or NOnh explicit alignment sometimes uses only a fraction of the atoms. Fixed in version 4.7.

(2) The MAlign command doesn't work sometimes. This bug is fixed in version 4.8.

If you run into these bugs, you may have an old version of the program.

Created at Fri Nov 28 16:47:31 2008 by MAN2HTML version 070111/2.0.8 . This manual describes LSQMAN, a program of the Uppsala Software Factory (USF), written and maintained by Gerard Kleywegt. © 1992-2007.