Uppsala Software Factory

Uppsala Software Factory - ZPROF Manual


1 ZPROF - GENERAL INFORMATION

Program : ZPROF
Version : 001121
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 596, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : calculate Z-scores for profile/database scan results
Package : SBIN


2 REFERENCES

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://xray.bmc.uu.se/gerard/papers/databases.html] [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10089488&dopt=Citation] [http://scripts.iucr.org/cgi-bin/paper?ba0001]

* 2 * Kleywegt, G.J., Zou, J.Y., Kjeldgaard, M. & Jones, T.A. (2001). Around O. In: "International Tables for Crystallography, Vol. F. Crystallography of Biological Macromolecules" (Rossmann, M.G. & Arnold, E., Editors). Chapter 17.1, pp. 353-356, 366-367. Dordrecht: Kluwer Academic Publishers, The Netherlands.


3 VERSION HISTORY

970813 - 0.1 - first version
971103 - 1.0 - cleaned up code and manual
001121 - 1.1 - increased max nr of sequences from 100,000 to 1,000,000


4 DESCRIPTION

ZPROF is a simple non-interactive program which reads a *sorted* list of profile/sequence scores (calculated with the pftools-program "pfsearch") and calculates Z-scores.

Usage: ZPROF [Z-score cut-off] < sorted_list > log_file

The value for the Z-score cut-off is optional (defaults to 4.0).

Typical example:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
pfsearch -a aligned.prf /home/gerard/lib/sprot.dat | & tee pfsearch_all.log
sort -nr pfsearch_all.log > pfsearch_all.sorted
ZPROF 3.5 < pfsearch_all.sorted > zprof.top
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The output may look as follows:

(1) The Z-score cut-off value is set to the default; if a command-line argument is found which can be interpreted as a real or integer number, that cut-off is used instead:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***

Version - 971103/1.0 (C) 1992-1999 Gerard J. Kleywegt, Dept. Cell Mol. Biol., Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.

Started - Mon Oct 11 19:32:25 1999 User - gerard Mode - interactive Host - sarek ProcID - 13383 Tty - /dev/ttyq12

*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://xray.bmc.uu.se/gerard/papers/databases.html] [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=10089488&form=6&db=m&Dopt=b] [http://www.iucr.org/iucr-top/journals/acta/tocs/actad/1998/actad5406_1.html]

* 2 * G.J. Kleywegt, J.Y. Zou, M. Kjeldgaard & T.A. Jones (1999 ?). Chapter 25.2.6. Around O. Int. Tables for Crystallography, Volume F. Submitted.

==> For manuals and up-to-date references, visit: ==> http://xray.bmc.uu.se/usf

*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***

Z-score cut-off : ( 4.000) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(2) The sorted score file is read and statistics are calculated and printed. Then an iterative process starts (max. 10 cycles) to determine the average score and the standard deviation therein for the sequences whose score is less than Average + Z_cut-off * St_deviation. When the number of such sequences is constant, the calculations have converged. The values for the average and standard deviation of that cycle will be used.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Working ...

Nr of sequences scored : ( 59021) Average : ( 178.673) St.dev. : ( 44.654) Minimum : ( 26.000) Maximum : ( 594.000)

Remove "outliers" and re-calc ... Nr of sequences left : ( 58944) Average : ( 178.324) St.dev. : ( 43.513) Minimum : ( 26.000) Maximum : ( 334.000)

Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)

Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)

Converged ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(3) A bit of "profile code" is printed which can be cut and pasted into the profile file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
MA   /NORMALIZATION: MODE=1; FUNCTION=LINEAR;
MA     R1=    -4.09993267; R2=     0.02299311; TEXT ='Z-score';
MA   /CUT_OFF: LEVEL=0; SCORE=     331; N_SCORE=     3.50000000; MODE=1;
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(4) In case you want to use a different cut-off, a number of Z-score cut-off values and the corresponding raw score values are printed:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Z-score of  0.0 requires raw score      178
 Z-score of  0.5 requires raw score      200
 Z-score of  1.0 requires raw score      222
 Z-score of  1.5 requires raw score      244
 Z-score of  2.0 requires raw score      265
 Z-score of  2.5 requires raw score      287
 Z-score of  3.0 requires raw score      309
 Z-score of  3.5 requires raw score      331
 Z-score of  4.0 requires raw score      352
 Z-score of  4.5 requires raw score      374
 Z-score of  5.0 requires raw score      396
 Z-score of  5.5 requires raw score      418
 Z-score of  6.0 requires raw score      439
 Z-score of  6.5 requires raw score      461
 Z-score of  7.0 requires raw score      483
 Z-score of  7.5 requires raw score      504
 Z-score of  8.0 requires raw score      526
 Z-score of  8.5 requires raw score      548
 Z-score of  9.0 requires raw score      570
 Z-score of  9.5 requires raw score      591
 Z-score of 10.0 requires raw score      613
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(5) The top-scoring entries are listed with their rank, Z-score and raw score. (Note that this assumes that the input file was already sorted !!!) After the first entry which scores below the Z-score cut-off has been listed, the listing ends.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
        1     9.56    594 P29257|LEC2_CYTSC 2-ACETAMIDO-2-DEOXY-D-GALACTOSE-BINDING SEED LECTIN II
        2     9.24    580 P45797|GUB_BACPO BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1,
        3     9.03    571 P27051|GUB_BACLI BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1,

[...]

80 3.56 333 P36851|HEX_ADE07 HEXON PROTEIN (LATE PROTEIN 2). 81 3.56 333 P36849|HEX_ADE03 HEXON PROTEIN (LATE PROTEIN 2). 82 3.53 332 P32491|MKK2_YEAST PROTEIN KINASE MKK2/SSP33 (EC 2.7.1.-). 83 3.49 330 P38419|LOXC_ORYSA LIPOXYGENASE, CHLOROPLAST PRECURSOR (EC 1.13.11.12). ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(6) Finally, a brief summary is printed:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Z-score cut-off : (   3.500)
 Nr of "hits"    : (         82)
 % of database   : (   0.139)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5 KNOWN BUGS

None, at present ("peppar, peppar").


6 UNKNOWN BUGS

Does not compute.


Uppsala Software Factory Created at Fri Jan 14 20:12:44 2005 by MAN2HTML version 050114/2.0.6 . This manual describes ZPROF, a program of the Uppsala Software Factory (USF), written and maintained by Gerard Kleywegt. © 1992-2005.