Example of BLAST output

Example of BLAST output


This example is an exerpt of the blastp program output. Different links have been made available to definitions, expressions, parameters etc.. used in the BLAST family programs. More informations can be found in the blast manual. Other examples of reading BLAST results and searching their corresponding bibliographies might be consulted. In these examples results are marked with the blast2html script, so that moving between descriptive lines and their corresponding alignments, as well as quering database servers (Swiss-Prot, Genbank, EMBL, PDB, NR3D, PROSITE, etc...) can be done just by "pointing and clicking".
A. Program Introduction BLASTP 1.4.7 [6-Oct-94] [Build 12:00:56 Oct 13 1994] Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10. Query= P1;YEEC_ECOLI Length: 347 (347 letters) Database: SwissProt version 30 Sat Oct 15 12:04:49 MET 1994 40,292 sequences; 14,147,368 total letters. Searching..................................................done B. Histogram of Expectations Observed Numbers of Database Sequences Satisfying Various EXPECTation Thresholds (E parameter values) Histogram units: = 20 Sequences : less than 20 sequences EXPECTation Threshold (E parameter) | V Observed Counts--> 10000 4492 1088 |====================================================== 6310 3404 1186 |=========================================================== 3980 2218 569 |============================ 2510 1649 678 |================================= 1580 971 294 |============== 1000 677 241 |============ 631 436 145 |======= 398 291 118 |===== 251 173 56 |== 158 117 34 |= 100 83 23 |= 63.1 60 13 |: 39.8 47 15 |: 25.1 32 8 |: 15.8 24 6 |: >>>>>>>>>>>>>>>>>>>>> Expect = 10.0, Observed = 18 <<<<<<<<<<<<<<<<< 10.0 18 5 |: 6.31 13 1 |: 3.98 12 2 |: 2.51 10 0 | 1.58 10 1 |: C. On-line Summaries Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N ............................................................................... sp|P08506|DACC_ECOLI PENICILLIN-BINDING PROTEIN 6 PRECURS... 894 5.0e-120 1 sp|P38422|DACF_BACSU PENICILLIN-BINDING DACF PROTEIN PREC... 209 5.0e-47 3 ............................................................................... sp|P28271|IREB_MOUSE IRON-RESPONSIVE ELEMENT BINDING PROT... 59 0.9996 1 sp|P31571|CAIA_ECOLI PROBABLE CARNITINE OPERON OXIDOREDUC... 48 0.9998 2 D. Alignments .............................................................................. >sp|P08506|DACC_ECOLI PENICILLIN-BINDING PROTEIN 6 PRECURSOR (D-ALANYL-D-ALANINE CARBOXYPEPTIDASE FRACTION C) (EC 3.4.16.4) (DD-PEPTIDASE) (DD-CARBOXYPEPTIDASE) (PBP-6). Length = 400 Score = 894 (409.5 bits), Expect = 5.0e-120, P = 5.0e-120 Identities = 169/342 (49%), Positives = 237/3 42 (69%) Query: 1 MDYTTGQILTAGNEHQQRNPASLTKLMTGYVVDRAIDSHRITPDDIVTVGRDAWAKDNPV 60 MDY +G++L GN ++ +PASLTK+MT YVV +A+ + +I D+VTVG+DAWA NP Sbjct: 45 MDYASGKVLAEGNADEKLDPASLTKIMTSYVVGQALKADKIKLTDMVTVGKDAWATGNPA 104 Query: 61 FVGSSLMFLKEGDRVSVRDLSRGLIVDSGNDACVALADYIAGGQRQFVEMMNNYAEKLHL 120 GSS+MFLK GD+VSV DL++G+I+ SGNDAC+ALADY+AG Q F+ +MN YA+KL L Sbjct: 105 LRGSSVMFLKPGDQVSVADLNKGVIIQSGNDACIALADYVAGSQESFIGLMNGYAKKLGL 164 Query: 121 KDTHFETVHGLDAPGQHSSAYDLAVLSRAIIHGEPEFYHMYSEKSLTWNGITQQNRNGLL 180 +T F+TVHGLDAPGQ S+A D+A+L +A+IH PE Y ++ EK T+N I Q NRN LL Sbjct: 165 TNTTFQTVHGLDAPGQFSTARDMALLGKALIHDVPEEYAIHKEKEFTFNKIRQPNRNRLL 224 Query: 181 WDKTMNVDGLKTGHTSGAGFNLIASAVDGQRRLIAVVMGADSAKGREEEARKLLRWGQQN 240 W +N DG+KTG T+GAG+NL+ASA G RLI+VV+GA + + R E+ KLL WG + Sbjct: 225 WSSNLNEDGMKTGTTAGAGYNLVASATQGDMRLISVVLGAKTDRIRFNESEKLLTWGFRF 284 Query: 241 FTTVQILHRGKKVGTERIWYGDKENIDLGTEQEFWMVLPKAEIPHIKAKYTLDGKELTAP 300 F TV + T+R+W+GDK ++LG + + +P+ ++ ++KA YTL +LTAP Sbjct: 285 FETVTPIKPDATFVTQRVWFGDKSEVNLGAGEAGSVTIPRGQLKNLKASYTLTEPQLTAP 344 Query: 301 ISAHQRVGEIELYDRDKQVAHWPLVTLESVGEGSMFSRLSDY 342 + Q VG I+ K + PL+ +E+V EG F R+ D+ Sbjct: 345 LKKGQVVGTIDFQLNGKSIEQRPLIVMENVEEGGFFGRVWDF 386 >sp|P38422|DACF_BACSU PENICILLIN-BINDING DACF PROTEIN PRECURSOR (D-ALANYL-D-ALANINE CARBOXYPEPTIDASE) (EC 3.4.16.4) (DD-PEPTIDASE) (DD-CARBOXYPEPTIDASE). Length = 389 Score = 209 (95.7 bits), Expect = 5.0e-47, Sum P(3) = 5.0e-47 Identities = 37/93 (39%), Positives = 68/93 (73%) Query: 62 VGSSLMFLKEGDRVSVRDLSRGLIVDSGNDACVALADYIAGGQRQFVEMMNNYAEKLHLK 121 +G S +FL+ G+ ++V+++ +G+ + SGNDA VA+A++I+G + +FV+ MN A++L LK Sbjct: 98 MGGSQIFLEPGEEMTVKEMLKGIAIASGNDASVAMAEFISGSEEEFVKKMNKKAKELGLK 157 Query: 122 DTHFETVHGLDAPGQHSSAYDLAVLSRAIIHGE 154 +T F+ GL G +SSAYD+A++++ ++ E Sbjct: 158 NTSFKNPTGLTEEGHYSSAYDMAIMAKELLKYE 190 Score = 161 (73.8 bits), Expect = 5.0e-47, Sum P(3) = 5.0e-47 Identities = 45/153 (29%), Positives = 73/153 (47%) Query: 187 VDGLKTGHTSGAGFNLIASAVDGQRRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQI 246 VDG+KTG+T A + L ASA G R IAVV GA + K R + K+L + + T + Sbjct: 226 VDGVKTGYTGEAKYCLTASAKKGNMRAIAVVFGASTPKERNAQVTKMLDFAFSQYETHPL 285 Query: 247 LHRGKKVGTERIWYGDKENIDLGTEQEFWMVLPKAEIPHIKAKYTLDGKELTAPISAHQR 306 R + V ++ G ++ I+L T + ++ K E + K ++API Q Sbjct: 286 YKRNQTVAKVKVKKGKQKFIELTTSEPISILTKKGEDMNDVKKEIKMKDNISAPIQKGQE 345 Query: 307 VGEIELYDRDKQVAHWPLVTLESVGEGSMFSRL 339 +G + L + +A P+ E + + S L Sbjct: 346 LGTLVLKKDGEVLAESPVAAKEDMKKAGFISFL 378 Score = 77 (35.3 bits), Expect = 5.0e-47, Sum P(3) = 5.0e-47 Identities = 17/49 (34%), Positives = 28/49 (57%) Query: 5 TGQILTAGNEHQQRNPASLTKLMTGYVVDRAIDSHRITPDDIVTVGRDA 53 TG++L N +++ PAS+TK+MT ++ A+D +I D V A Sbjct: 47 TGKVLYNKNSNERLAPASMTKIMTMLLIMEALDKGKIKMSDKVRTSEHA 95 .............................................................. >sp|P28271|IREB_MOUSE IRON-RESPONSIVE ELEMENT BINDING PROTEIN (IRE-BP) (FERRITIN REPRESSOR PROTEIN) (ACONITATE HYDRATASE) (EC 4.2.1.3) (CITRATE HYDRO-LYASE) (ACONITASE). Length = 889 Score = 59 (27.0 bits), Expect = 7.9, P = 1.0 Identities = 14/47 (29%), Positives = 23/47 (48%) Query: 207 VDGQRRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQILHRGKKV 253 VD RR ++ D R +E + L+WG Q F ++I+ G + Sbjct: 130 VDFNRRADSLQKNQDLEFERNKERFEFLKWGSQAFCNMRIIPPGSGI 176 >sp|P31571|CAIA_ECOLI PROBABLE CARNITINE OPERON OXIDOREDUCTASE CAIA (EC 1.3.99.-). Length = 380 Score = 48 (22.0 bits), Expect = 8.7, Sum P(2) = 1.0 Identities = 11/31 (35%), Positives = 17/31 (54%) Query: 46 IVTVGRDAWAKDNPVFVGSSLMFLKEGDRVS 76 IV + RD + D PV+ G + K G +V+ Sbjct: 165 IVVMARDGASPDKPVYTGWFVDMSKPGIKVT 195 Score = 43 (19.7 bits), Expect = 8.7, Sum P(2) = 1.0 Identities = 11/35 (31%), Positives = 17/35 (48%) Query: 197 GAGFNLIASAVDGQRRLIAVVMGADSAKGREEEAR 231 G GFN + D +R L+A+ + E+ AR Sbjct: 227 GNGFNRVKEEFDHERFLVALTNYGTAMCAFEDAAR 261 Parameters: H=1 V=500 B=250 -ctxfactor=1.00 E=10 Query ----- As Used ----- ----- Computed ---- Frame MatID Matrix name Lambda K H Lambda K H +0 0 BLOSUM62 0.318 0.135 0.399 same same same Query Frame MatID Length Eff.Length E S W T X E2 S2 +0 0 347 347 10. 59 3 11 22 0.21 34 Statistics: Query Expected Observed HSPs HSPs Frame MatID High Score High Score Reportable Reported +0 0 63 (28.9 bits) 1826 (836.5 bits) 42 42 Query Neighborhd Word Excluded Failed Successful Overlaps Frame MatID Words Hits Hits Extensions Extensions Excluded +0 0 9047 10061369 2135661 7914281 11427 5 Database: SwissProt version 30 Sat Oct 15 12:04:49 MET 1994 Release date: unknown Posted date: 12:08 PM MET Oct 15, 1994 # of letters in database: 14,147,368 # of sequences in database: 40,292 # of database sequences satisfying E: 18 No. of states in DFA: 573 (56 KB) Total size of DFA: 149 KB (192 KB) Time to generate neighborhood: 0.10u 0.01s 0.11t Real: 00:00:00 Time to search database: 113.33u 3.65s 116.98t Real: 00:03:11 Total cpu time: 113.51u 3.80s 117.31t Real: 00:03:12
>up B. Histogram of Expectations
     Shown in the output below is a histogram of the lowest (most
     significant)  Expect  values  obtained  with  each  database
     sequence.  This information is  useful  in  determining  the
     numbers  of  database  sequences  that achieved a particular
     level of statistical significance.  It indicates the  number
     of database matches that would be reportable at various set-
     of database matches that would be reportable at various set-
     tings for the expectation threshold (E parameter).

>up Searching periods
     The "Searching..." indicator  indicates  progress  that  the
     program made in searching the database.  A complete database
     search will yield 50 periods (.), or one period per database
     sequence,  whichever  number  is  smaller.  When searching a
     database consisting of 50 sequences or more, if  fewer  than
     50  periods  are  displayed and the program aborted for some
     reason, dividing the number of periods by 0.5 will yield the
     approximate  percentage  (0-100%)  of  the database that was
     searched before the program died.  If the program had diffi-
     culty  making  progress  through  the  database, one or more
     asterisks (*) may be interspersed  between  the  periods  at
     one-minute intervals.

>up One-line Summaries
     The one-line sequence descriptions and summaries of  results
     are useful for identifying biologically interesting database
     matches and correlating this interest with  the  statistical
     significance  estimates.   Unless  otherwise  requested, the
     database sequences are sorted by increasing P-value  (proba-
     bility).   Identifiers  for the database sequences appear in
     the first column;  then  come  brief  descriptions  of  each
     sequence,  which may need to be truncated in order to fit in
     the available space.

>up HSP score
The High Score column contains the score of the highest-scoring HSP
found with each database sequence. It should be noted that the 
highest-scoring  HSP whose  score  is  reported in the "High Score" 
column is not necessarily a member of the set of  HSPs  which  yields  the
lowest P-value; the highest-scoring HSP may be excluded from
this set on the basis of  consistency  rules  governing  the
grouping  of HSPs (see the -consistency option).

>up P(N) column
     The "P(N)" column  contains  the  lowest  P-value
     ascribed  to any set of HSPs for each database sequence.
     The P-values are a function of N,
     as used in Karlin-Altschul  Sum  statistics  or
     Poisson  statistics, to treat situations where multiple HSPs
     are found.  It should be noted that the highest-scoring  HSP
     whose  score  is  reported in the "High Score" column is not
     necessarily a member of the set of  HSPs  which  yields  the
     lowest P-value; the highest-scoring HSP may be excluded from
     this set on the basis of  consistency  rules  governing  the
     grouping  of HSPs (see the -consistency option).  Numbers of
     the form "7.7e-160" are in  scientific  notation.   In  this
     particular  example,  the  number  being  represented is 7.7
     times 10 to the minus 160th power, or 7.7 x 10 sup  {  -160
     }, which is astronomically close to zero.

>up N column
The N column displays the number of HSPs in 
the set which was ascribed the lowest P-value.

>up Alignments : 
     Alignments found with the BLAST algorithm are ungapped.
     Several statistics are used to describe each HSP.

>up Score : For the segment pair display, the score is the sum 
     of the scoring matrix values in the segment pair 
     being displayed;

>up Bit : The raw score is converted to bits of information
     by multiplying by lambda (see the Statistics output);

>up Expect : the number of times one might Expect to see such a match (or
     a  better one) merely by chance;

>up P or Sum P : The P-value (probability in
     the range 0-1) of observing such a  match;

>up Identities : The number and fraction of total residues 
     in the HSP which are identical;

>up Positives : The number and fraction of residues for 
     which the  alignment scores  have positive values.

     When Sum statistics have been
     used to calculate the Expect and P-values,  the  P-value  is
     qualified  with  the  word "Sum" and the N parameter used in
     the Sum statistics is provided in  parentheses  to  indicate
     the  number of HSPs in the set; when Poisson statistics have
     been used to calculate the Expect and P-values, the  P-value
     is qualified with the word "Poisson".  

>up Between the two lines
     Between the rwo lines
     of Query and Subject (database) sequence is a line  indicat-
     ing  the  specific  residues which are identical, as well as
     those which are non-identical but nevertheless have positive
     alignment scores defined in the scoring matrix that was used
     (the BLOSUM62 matrix in this case).   Identical  letters  or
     residues,  when  paired with each other, are not highlighted
     if their alignment score is negative or zero.   Examples  of
     this  would  be  an X juxtaposed with an X in two amino acid
     sequences, or an N juxtaposed with another N in two  nucleo-
     tide sequences.  Such ambiguous residue-residue pairings may
     be uninformative and thus lend no  support  to  the  overall
     alignment being either real or random; however, the informa-
     tiveness of these pairings is left up to  the  user  of  the
     BLAST  programs to decide, because any values desired can be
     specified in a scoring matrix of the user's own making.

Presented by Fredj Tekaia tekaia@pasteur.fr