Program : ZPROF
Version : 001121
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology,
Uppsala University, Biomedical Centre, Box 596,
SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : calculate Z-scores for profile/database scan results
Package : SBIN
Reference(s) for this program:
* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://xray.bmc.uu.se/gerard/papers/databases.html] [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10089488&dopt=Citation] [http://scripts.iucr.org/cgi-bin/paper?ba0001]
* 2 * Kleywegt, G.J., Zou, J.Y., Kjeldgaard, M. & Jones, T.A. (2001). Around O. In: "International Tables for Crystallography, Vol. F. Crystallography of Biological Macromolecules" (Rossmann, M.G. & Arnold, E., Editors). Chapter 17.1, pp. 353-356, 366-367. Dordrecht: Kluwer Academic Publishers, The Netherlands.
970813 - 0.1 - first version
971103 - 1.0 - cleaned up code and manual
001121 - 1.1 - increased max nr of sequences from 100,000 to 1,000,000
ZPROF is a simple non-interactive program which reads a *sorted* list of profile/sequence scores (calculated with the pftools-program "pfsearch") and calculates Z-scores.
Usage: ZPROF [Z-score cut-off] < sorted_list > log_file
The value for the Z-score cut-off is optional (defaults to 4.0).
Typical example:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- pfsearch -a aligned.prf /home/gerard/lib/sprot.dat | & tee pfsearch_all.log sort -nr pfsearch_all.log > pfsearch_all.sorted ZPROF 3.5 < pfsearch_all.sorted > zprof.top ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The output may look as follows:
(1) The Z-score cut-off value is set to the default; if a command-line argument is found which can be interpreted as a real or integer number, that cut-off is used instead:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***Version - 971103/1.0 (C) 1992-1999 Gerard J. Kleywegt, Dept. Cell Mol. Biol., Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.
Started - Mon Oct 11 19:32:25 1999 User - gerard Mode - interactive Host - sarek ProcID - 13383 Tty - /dev/ttyq12
*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***
Reference(s) for this program:
* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://xray.bmc.uu.se/gerard/papers/databases.html] [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=10089488&form=6&db=m&Dopt=b] [http://www.iucr.org/iucr-top/journals/acta/tocs/actad/1998/actad5406_1.html]
* 2 * G.J. Kleywegt, J.Y. Zou, M. Kjeldgaard & T.A. Jones (1999 ?). Chapter 25.2.6. Around O. Int. Tables for Crystallography, Volume F. Submitted.
==> For manuals and up-to-date references, visit: ==> http://xray.bmc.uu.se/usf
*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***
Z-score cut-off : ( 4.000) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(2) The sorted score file is read and statistics are calculated and printed. Then an iterative process starts (max. 10 cycles) to determine the average score and the standard deviation therein for the sequences whose score is less than Average + Z_cut-off * St_deviation. When the number of such sequences is constant, the calculations have converged. The values for the average and standard deviation of that cycle will be used.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Working ...Nr of sequences scored : ( 59021) Average : ( 178.673) St.dev. : ( 44.654) Minimum : ( 26.000) Maximum : ( 594.000)
Remove "outliers" and re-calc ... Nr of sequences left : ( 58944) Average : ( 178.324) St.dev. : ( 43.513) Minimum : ( 26.000) Maximum : ( 334.000)
Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)
Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)
Converged ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(3) A bit of "profile code" is printed which can be cut and pasted into the profile file:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- MA /NORMALIZATION: MODE=1; FUNCTION=LINEAR; MA R1= -4.09993267; R2= 0.02299311; TEXT ='Z-score'; MA /CUT_OFF: LEVEL=0; SCORE= 331; N_SCORE= 3.50000000; MODE=1; ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(4) In case you want to use a different cut-off, a number of Z-score cut-off values and the corresponding raw score values are printed:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Z-score of 0.0 requires raw score 178 Z-score of 0.5 requires raw score 200 Z-score of 1.0 requires raw score 222 Z-score of 1.5 requires raw score 244 Z-score of 2.0 requires raw score 265 Z-score of 2.5 requires raw score 287 Z-score of 3.0 requires raw score 309 Z-score of 3.5 requires raw score 331 Z-score of 4.0 requires raw score 352 Z-score of 4.5 requires raw score 374 Z-score of 5.0 requires raw score 396 Z-score of 5.5 requires raw score 418 Z-score of 6.0 requires raw score 439 Z-score of 6.5 requires raw score 461 Z-score of 7.0 requires raw score 483 Z-score of 7.5 requires raw score 504 Z-score of 8.0 requires raw score 526 Z-score of 8.5 requires raw score 548 Z-score of 9.0 requires raw score 570 Z-score of 9.5 requires raw score 591 Z-score of 10.0 requires raw score 613 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(5) The top-scoring entries are listed with their rank, Z-score and raw score. (Note that this assumes that the input file was already sorted !!!) After the first entry which scores below the Z-score cut-off has been listed, the listing ends.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- 1 9.56 594 P29257|LEC2_CYTSC 2-ACETAMIDO-2-DEOXY-D-GALACTOSE-BINDING SEED LECTIN II 2 9.24 580 P45797|GUB_BACPO BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1, 3 9.03 571 P27051|GUB_BACLI BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1,[...]
80 3.56 333 P36851|HEX_ADE07 HEXON PROTEIN (LATE PROTEIN 2). 81 3.56 333 P36849|HEX_ADE03 HEXON PROTEIN (LATE PROTEIN 2). 82 3.53 332 P32491|MKK2_YEAST PROTEIN KINASE MKK2/SSP33 (EC 2.7.1.-). 83 3.49 330 P38419|LOXC_ORYSA LIPOXYGENASE, CHLOROPLAST PRECURSOR (EC 1.13.11.12). ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(6) Finally, a brief summary is printed:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Z-score cut-off : ( 3.500) Nr of "hits" : ( 82) % of database : ( 0.139) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
None, at present ("peppar, peppar").
Does not compute.