Example of FASTA search output

This example is an excerpt of FASTA search output. Links have been made available to explain definitions, expressions,... in the FASTA output. More informations can be found in the FASTA algorithm. Other examples of reading FASTA results and searching their corresponding bibliographies might be consulted. In these examples results are marked with the fasta2html script, so that moving between descriptive lines and their corresponding alignments, as well as quering database servers (Swiss-Prot, Genbank, EMBL, PDB, NR3D, PROSITE, etc...) can be done just by "pointing and clicking".
fasta - SEARCH DATABASE Swissprot WITH FILE: yee_C.seq Run on: Mon Oct 3 16:32:30 MET 1994 on mendel.sis.pasteur.
. yee_C.seq : 287 aa >YEEC_ECOLI Length: 347, 287 bases, 52DE1AA3 che: 287 aa vs Swiss-Prot Release 29 library searching Swiss-Prot library
Distribution of initial scores with ktup=2. | v initn init1 < 2 2 2:= 4 0 0: 6 4 4:== 8 18 18:========= 10 73 73:===================================== 12 326 326:================================================== 14 370 370:================================================== 16 1248 1248:================================================== 18 1354 1354:================================================== 20 2746 2746:================================================== 22 3151 3151:================================================== 24 6401 6401:================================================== 26 5110 5110:================================================== 28 5724 5763:================================================== 30 3887 4303:================================================== 32 2238 2682:================================================== 34 1401 1735:================================================== 36 863 1144:================================================== 38 533 690:================================================== 40 346 411:================================================== 42 481 250:================================================== 44 419 166:================================================== 46 346 110:================================================== 48 295 84:------------------------------------------++++++++ 50 203 47:------------------------++++++++++++++++++++++++++ 52 184 46:-----------------------+++++++++++++++++++++++++++ 54 110 39:--------------------++++++++++++++++++++++++++++++ 56 82 9:-----++++++++++++++++++++++++++++++++++++ 58 69 8:----+++++++++++++++++++++++++++++++ 60 71 3:--++++++++++++++++++++++++++++++++++ 62 75 1:-+++++++++++++++++++++++++++++++++++++ 64 36 1:-+++++++++++++++++ 66 31 0:++++++++++++++++ 68 17 2:-++++++++ 70 12 0:++++++ 72 12 0:++++++ 74 6 0:+++ 76 10 1:-++++ 78 28 0:++++++++++++++ 80 2 0:+ > 80 19 5:---+++++++ 13464008 residues in 38303 sequences statistics exclude scores greater than 73 mean initn score: 26.8 (7.79) mean init1 score: 26.0 (6.05) 5349 scores better than 33 saved, ktup: 2, variable pamfact joining threshold: 28 scan time: 0:00:30
The best scores are: initn init1 opt sp|P33013|YEEC_ECOLI HYPOTHETICAL 38.9 KD PROTEIN IN S 1422 1422 1422 sp|P04287|DACA_ECOLI PENICILLIN-BINDING PROTEIN 5 PREC 789 789 797 sp|P08506|DACC_ECOLI PENICILLIN-BINDING PROTEIN 6 PREC 738 738 766 sp|Q05523|DACA_BACST D-ALANYL-D-ALANINE CARBOXYPEPTIDA 176 116 129 ......................................................................
ALignments sp|P33013|YEEC_ECOLI HYPOTHETICAL 38.9 KD PROTEIN IN S 1422 1422 1422 100.0% identity in 287 aa overlap 10 20 30 YEEC_E FVGSSLMFLKEGDRVSVRDLSRGLIVDSGN X::::::::::::::::::::::::::::: sp|P33 VVDRAIDSHRITPDDIVTVGRDAWAKDNPVFVGSSLMFLKEGDRVSVRDLSRGLIVDSGN 40 50 60 70 80 90 40 50 60 70 80 90 YEEC_E DACVALADYIAGGQRQFVEMMNNYAEKLHLKDTHFETVHGLDAPGQHSSAYDLAVLSRAI :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P33 DACVALADYIAGGQRQFVEMMNNYAEKLHLKDTHFETVHGLDAPGQHSSAYDLAVLSRAI 100 110 120 130 140 150 100 110 120 130 140 150 YEEC_E IHGEPEFYHMYSEKSLTWNGITQQNRNGLLWDKTMNVDGLKTGHTSGAGFNLIASAVDGQ :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P33 IHGEPEFYHMYSEKSLTWNGITQQNRNGLLWDKTMNVDGLKTGHTSGAGFNLIASAVDGQ 160 170 180 190 200 210 160 170 180 190 200 210 YEEC_E RRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQILHRGKKVGTERIWYGDKENIDLGT :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P33 RRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQILHRGKKVGTERIWYGDKENIDLGT 220 230 240 250 260 270 220 230 240 250 260 270 YEEC_E EQEFWMVLPKAEIPHIKAKYTLDGKELTAPISAHQRVGEIELYDRDKQVAHWPLVTLESV :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: sp|P33 EQEFWMVLPKAEIPHIKAKYTLDGKELTAPISAHQRVGEIELYDRDKQVAHWPLVTLESV 280 290 300 310 320 330 280 YEEC_E GEGSMFSRLSDYFHHKA ::::::::::::::::X sp|P33 GEGSMFSRLSDYFHHKA 340 sp|P04287|DACA_ECOLI PENICILLIN-BINDING PROTEIN 5 PREC 789 789 797 50.2% identity in 283 aa overlap 10 20 30 YEEC_E FVGSSLMFLKEGDRVSVRDLSRGLIVDSGN : X::::::: : .:.:..: ::. ..::: sp|P04 VIGQAMKAGKFKETDLVTIGNDAWATGNPVFKGSSLMFLKPGMQVPVSQLIRGINLQSGN 90 100 110 120 130 140 40 50 60 70 80 90 YEEC_E DACVALADYIAGGQRQFVEMMNNYAEKLHLKDTHFETVHGLDAPGQHSSAYDLAVLSRAI :::::.::. ::.: .::..::.:.. : ::.:::.::::::: ::.::: :.:....:. sp|P04 DACVAMADFAAGSQDAFVGLMNSYVNALGLKNTHFQTVHGLDADGQYSSARDMALIGQAL 150 160 170 180 190 200 100 110 120 130 140 150 YEEC_E IHGEPEFYHMYSEKSLTWNGITQQNRNGLLWDKTMNVDGLKTGHTSGAGFNLIASAVDGQ :.. :. : .:.::..:.::: : ::::::::...::::.:::::. ::.::.:::..:: sp|P04 IRDVPNEYSIYKEKEFTFNGIRQLNRNGLLWDNSLNVDGIKTGHTDKAGYNLVASATEGQ 210 220 230 240 250 260 160 170 180 190 200 210 YEEC_E RRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQILHRGKKVGTERIWYGDKENIDLGT .:::..:::. . ::::.:..::: :: . :.::. :. ::. ..:..:.::... .::. sp|P04 MRLISAVMGGRTFKGREAESKKLLTWGFRFFETVNPLKVGKEFASEPVWFGDSDRASLGV 270 280 290 300 310 320 220 230 240 250 260 270 YEEC_E EQEFWMVLPKAEIPHIKAKYTLDGKELTAPISAHQRVGEIELYDRDKQVAHWPLVTLESV ... ....:.. . ..::.:.:...:: ::. .: ::.:.. .: ....:::.:... sp|P04 DKDVYLTIPRGRMKDLKASYVLNSSELHAPLQKNQVVGTINFQLDGKTIEQRPLVVLQEI 330 340 350 360 370 380 280 YEEC_E GEGSMFSRLSDYFHHKA ::..:... :X.. sp|P04 PEGNFFGKIIDYIKLMFHHWFG 390 400 sp|P08506|DACC_ECOLI PENICILLIN-BINDING PROTEIN 6 PREC 738 738 766 49.3% identity in 280 aa overlap 10 20 30 YEEC_E FVGSSLMFLKEGDRVSVRDLSRGLIVDSGNDA X::.:::: ::.::: ::..:.:..::::: sp|P08 GQALKADKIKLTDMVTVGKDAWATGNPALRGSSVMFLKPGDQVSVADLNKGVIIQSGNDA 80 90 100 110 120 130 40 50 60 70 80 90 YEEC_E CVALADYIAGGQRQFVEMMNNYAEKLHLKDTHFETVHGLDAPGQHSSAYDLAVLSRAIIH :.:::::.::.: :...::.::.:: :..: :.:::::::::: :.: :.:.:..:.:: sp|P08 CIALADYVAGSQESFIGLMNGYAKKLGLTNTTFQTVHGLDAPGQFSTARDMALLGKALIH 140 150 160 170 180 190 100 110 120 130 140 150 YEEC_E GEPEFYHMYSEKSLTWNGITQQNRNGLLWDKTMNVDGLKTGHTSGAGFNLIASAVDGQRR . :: : ...::..:.: : :.::: :::....: ::.::: :.:::.::.:::..:..: sp|P08 DVPEEYAIHKEKEFTFNKIRQPNRNRLLWSSNLNEDGMKTGTTAGAGYNLVASATQGDMR 200 210 220 230 240 250 160 170 180 190 200 210 YEEC_E LIAVVMGADSAKGREEEARKLLRWGQQNFTTVQILHRGKKVGTERIWYGDKENIDLGTEQ ::.::.::.... : .:. ::: :: . :.:: .... . :.:.:.:::....::... sp|P08 LISVVLGAKTDRIRFNESEKLLTWGFRFFETVTPIKPDATFVTQRVWFGDKSEVNLGAGE 260 270 280 290 300 310 220 230 240 250 260 270 YEEC_E EFWMVLPKAEIPHIKAKYTLDGKELTAPISAHQRVGEIELYDRDKQVAHWPLVTLESVGE . ...:.... ..::.:::.. .::::.. : ::.:.. ..: ....::...:.:.: sp|P08 AGSVTIPRGQLKNLKASYTLTEPQLTAPLKKGQVVGTIDFQLNGKSIEQRPLIVMENVEE 320 330 340 350 360 370 280 YEEC_E GSMFSRLSDYFHHKA X..:.:. :. sp|P08 GGFFGRVWDFVMMKFHQWFGSWFS 380 390 400 sp|Q05523|DACA_BACST D-ALANYL-D-ALANINE CARBOXYPEPTIDA 176 116 129 41.3% identity in 63 aa overlap 10 20 30 YEEC_E FVGSSLMFLKEGDRVSVRDLSRGLIVDSGNDACVALAD :..... .X:.: .. . :.:.: ::.:. sp|Q05 KAKRVKWDQMYTPSDYVYRLSQDRALSNVPLRKDGKYTVRELYEAMAIYSANGATVAIAE 90 100 110 120 130 140 40 50 60 70 80 90 YEEC_E YIAGGQRQFVEMMNNYAEKLHLKDTHFETVHGLDAPGQHSSAYDLAVLSRAIIHGEPEFY :::....::.:::. :..: ::: .: .. :X. sp|Q05 IIAGSEKNFVKMMNDKAKELGLKDYKFVNATGLSNKDLKGFHPEGTSTNEENVMSARAMA 150 160 170 180 190 200 100 110 120 130 140 150 YEEC_E HMYSEKSLTWNGITQQNRNGLLWDKTMNVDGLKTGHTSGAGFNLIASAVDGQRRLIAVVM sp|Q05 MLAYRLLKDHPEVLKTASIPHKVFREGTKDEIKMDNWNWMLPGLVYGYEGVDGLKTGYTE 210 220 230 240 250 260 .................................................................. Library scan: 0:00:30 total CPU time: 0:00:31
up intervall < 2 : represents scores in the interval < 2. 4 : represents scores in the interval ]2-4] 6 : represents scores in the interval ]4-6] etc.......
up initn This column represents the number of library sequences that obtained an initial similarity score in their corresponding histogram interval
up init1 This column represents the number of sequences with a best single initial region similarity score in their corresponding histogram interval
up = When the number of sequences reported in columns initn and init1 are equal, they are represented by "=" signs in the histogram.
up +- When the number of sequences reported in columns initn and init1 are not equal, initn values are graphed with a "+" and the init1 values with a "-".
up mean and standard deviation mean score and it's corresponding standard deviation ().
up initn and init1 While the initn score, which is used to rank the library sequences, provides a very sensitive measure of protein sequence similarity, the relationship between the init1 score and the optimized score provides a more selective perspective.
up : and . The initial score for this comparison is 789 (initn), the score of the best single initial region is 789 (init1), and the score of the aligned amino acids in the optimized region denoted by ":" and "." is 797 (opt). Aligned amino acid identities are denoted by ":"; substitutions with PAM250 scores of zero or greater are denoted by ".".
up X The amino-acid-terminal boundary of the init1 initial region is denoted by an "X".
Comments and suggestions are welcome. Fredj Tekaia tekaia@pasteur.fr.