Example of BLAST output
Example of BLAST output
This example is an exerpt of the blastp program output.
Different links have been made available to definitions, expressions,
parameters etc.. used in the BLAST family programs.
More informations can be found in the blast manual.
Other examples of reading BLAST results and searching their corresponding
bibliographies might be consulted. In these examples results are marked with
the blast2html script, so that moving between descriptive lines and
their corresponding alignments, as well as quering database servers
(Swiss-Prot, Genbank, EMBL, PDB, NR3D, PROSITE, etc...) can be done just
by "pointing and clicking".
A. Program Introduction
BLASTP 1.4.7 [6-Oct-94] [Build 12:00:56 Oct 13 1994]
Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol.
215:403-10.
Query= P1;YEEC_ECOLI Length: 347
(347 letters)
Database: SwissProt version 30 Sat Oct 15 12:04:49 MET 1994
40,292 sequences; 14,147,368 total letters.
Searching..................................................done
B. Histogram of Expectations
Observed Numbers of Database Sequences Satisfying
Various EXPECTation Thresholds (E parameter values)
Histogram units: = 20 Sequences : less than 20 sequences
EXPECTation Threshold
(E parameter)
|
V Observed Counts-->
10000 4492 1088 |======================================================
6310 3404 1186 |===========================================================
3980 2218 569 |============================
2510 1649 678 |=================================
1580 971 294 |==============
1000 677 241 |============
631 436 145 |=======
398 291 118 |=====
251 173 56 |==
158 117 34 |=
100 83 23 |=
63.1 60 13 |:
39.8 47 15 |:
25.1 32 8 |:
15.8 24 6 |:
>>>>>>>>>>>>>>>>>>>>> Expect = 10.0, Observed = 18 <<<<<<<<<<<<<<<<<
10.0 18 5 |:
6.31 13 1 |:
3.98 12 2 |:
2.51 10 0 |
1.58 10 1 |:
C. On-line Summaries
Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N
...............................................................................
sp|P08506|DACC_ECOLI PENICILLIN-BINDING PROTEIN 6 PRECURS... 894 5.0e-120 1
sp|P38422|DACF_BACSU PENICILLIN-BINDING DACF PROTEIN PREC... 209 5.0e-47 3
...............................................................................
sp|P28271|IREB_MOUSE IRON-RESPONSIVE ELEMENT BINDING PROT... 59 0.9996 1
sp|P31571|CAIA_ECOLI PROBABLE CARNITINE OPERON OXIDOREDUC... 48 0.9998 2
D. Alignments
..............................................................................
>sp|P08506|DACC_ECOLI PENICILLIN-BINDING PROTEIN 6 PRECURSOR
(D-ALANYL-D-ALANINE CARBOXYPEPTIDASE FRACTION C) (EC 3.4.16.4)
(DD-PEPTIDASE) (DD-CARBOXYPEPTIDASE) (PBP-6).
Length = 400
Score = 894 (409.5 bits), Expect = 5.0e-120, P = 5.0e-120
Identities = 169/342 (49%), Positives = 237/3 42 (69%)
Query: 1 MDYTTGQILTAGNEHQQRNPASLTKLMTGYVVDRAIDSHRITPDDIVTVGRDAWAKDNPV 60
MDY +G++L GN ++ +PASLTK+MT YVV +A+ + +I D+VTVG+DAWA NP
Sbjct: 45 MDYASGKVLAEGNADEKLDPASLTKIMTSYVVGQALKADKIKLTDMVTVGKDAWATGNPA 104
Query: 61 FVGSSLMFLKEGDRVSVRDLSRGLIVDSGNDACVALADYIAGGQRQFVEMMNNYAEKLHL 120
GSS+MFLK GD+VSV DL++G+I+ SGNDAC+ALADY+AG Q F+ +MN YA+KL L
Sbjct: 105 LRGSSVMFLKPGDQVSVADLNKGVIIQSGNDACIALADYVAGSQESFIGLMNGYAKKLGL 164
Query: 121 KDTHFETVHGLDAPGQHSSAYDLAVLSRAIIHGEPEFYHMYSEKSLTWNGITQQNRNGLL 180
+T F+TVHGLDAPGQ S+A D+A+L +A+IH PE Y ++ EK T+N I Q NRN LL
Sbjct: 165 TNTTFQTVHGLDAPGQFSTARDMALLGKALIHDVPEEYAIHKEKEFTFNKIRQPNRNRLL 224
Query: 181 WDKTMNVDGLKTGHTSGAGFNLIASAVDGQRRLIAVVMGADSAKGREEEARKLLRWGQQN 240
W +N DG+KTG T+GAG+NL+ASA G RLI+VV+GA + + R E+ KLL WG +
Sbjct: 225 WSSNLNEDGMKTGTTAGAGYNLVASATQGDMRLISVVLGAKTDRIRFNESEKLLTWGFRF 284
Query: 241 FTTVQILHRGKKVGTERIWYGDKENIDLGTEQEFWMVLPKAEIPHIKAKYTLDGKELTAP 300
F TV + T+R+W+GDK ++LG + + +P+ ++ ++KA YTL +LTAP
Sbjct: 285 FETVTPIKPDATFVTQRVWFGDKSEVNLGAGEAGSVTIPRGQLKNLKASYTLTEPQLTAP 344
Query: 301 ISAHQRVGEIELYDRDKQVAHWPLVTLESVGEGSMFSRLSDY 342
+ Q VG I+ K + PL+ +E+V EG F R+ D+
Sbjct: 345 LKKGQVVGTIDFQLNGKSIEQRPLIVMENVEEGGFFGRVWDF 386
>sp|P38422|DACF_BACSU PENICILLIN-BINDING DACF PROTEIN PRECURSOR
(D-ALANYL-D-ALANINE CARBOXYPEPTIDASE) (EC 3.4.16.4) (DD-PEPTIDASE)
(DD-CARBOXYPEPTIDASE).
Length = 389
Score = 209 (95.7 bits), Expect = 5.0e-47, Sum P(3) = 5.0e-47
Identities = 37/93 (39%), Positives = 68/93 (73%)
Query: 62 VGSSLMFLKEGDRVSVRDLSRGLIVDSGNDACVALADYIAGGQRQFVEMMNNYAEKLHLK 121
+G S +FL+ G+ ++V+++ +G+ + SGNDA VA+A++I+G + +FV+ MN A++L LK
Sbjct: 98 MGGSQIFLEPGEEMTVKEMLKGIAIASGNDASVAMAEFISGSEEEFVKKMNKKAKELGLK 157
Query: 122 DTHFETVHGLDAPGQHSSAYDLAVLSRAIIHGE 154
+T F+ GL G +SSAYD+A++++ ++ E
Sbjct: 158 NTSFKNPTGLTEEGHYSSAYDMAIMAKELLKYE 190
Score = 161 (73.8 bits), Expect = 5.0e-47, Sum P(3) = 5.0e-47
Identities = 45/153 (29%), Positives = 73/153 (47%)
Query: 187 VDGLKTGHTSGAGFNLIASAVDGQRRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQI 246
VDG+KTG+T A + L ASA G R IAVV GA + K R + K+L + + T +
Sbjct: 226 VDGVKTGYTGEAKYCLTASAKKGNMRAIAVVFGASTPKERNAQVTKMLDFAFSQYETHPL 285
Query: 247 LHRGKKVGTERIWYGDKENIDLGTEQEFWMVLPKAEIPHIKAKYTLDGKELTAPISAHQR 306
R + V ++ G ++ I+L T + ++ K E + K ++API Q
Sbjct: 286 YKRNQTVAKVKVKKGKQKFIELTTSEPISILTKKGEDMNDVKKEIKMKDNISAPIQKGQE 345
Query: 307 VGEIELYDRDKQVAHWPLVTLESVGEGSMFSRL 339
+G + L + +A P+ E + + S L
Sbjct: 346 LGTLVLKKDGEVLAESPVAAKEDMKKAGFISFL 378
Score = 77 (35.3 bits), Expect = 5.0e-47, Sum P(3) = 5.0e-47
Identities = 17/49 (34%), Positives = 28/49 (57%)
Query: 5 TGQILTAGNEHQQRNPASLTKLMTGYVVDRAIDSHRITPDDIVTVGRDA 53
TG++L N +++ PAS+TK+MT ++ A+D +I D V A
Sbjct: 47 TGKVLYNKNSNERLAPASMTKIMTMLLIMEALDKGKIKMSDKVRTSEHA 95
..............................................................
>sp|P28271|IREB_MOUSE IRON-RESPONSIVE ELEMENT BINDING PROTEIN (IRE-BP)
(FERRITIN REPRESSOR PROTEIN) (ACONITATE HYDRATASE) (EC 4.2.1.3)
(CITRATE HYDRO-LYASE) (ACONITASE).
Length = 889
Score = 59 (27.0 bits), Expect = 7.9, P = 1.0
Identities = 14/47 (29%), Positives = 23/47 (48%)
Query: 207 VDGQRRLIAVVMGADSAKGREEEARKLLRWGQQNFTTVQILHRGKKV 253
VD RR ++ D R +E + L+WG Q F ++I+ G +
Sbjct: 130 VDFNRRADSLQKNQDLEFERNKERFEFLKWGSQAFCNMRIIPPGSGI 176
>sp|P31571|CAIA_ECOLI PROBABLE CARNITINE OPERON OXIDOREDUCTASE CAIA (EC
1.3.99.-).
Length = 380
Score = 48 (22.0 bits), Expect = 8.7, Sum P(2) = 1.0
Identities = 11/31 (35%), Positives = 17/31 (54%)
Query: 46 IVTVGRDAWAKDNPVFVGSSLMFLKEGDRVS 76
IV + RD + D PV+ G + K G +V+
Sbjct: 165 IVVMARDGASPDKPVYTGWFVDMSKPGIKVT 195
Score = 43 (19.7 bits), Expect = 8.7, Sum P(2) = 1.0
Identities = 11/35 (31%), Positives = 17/35 (48%)
Query: 197 GAGFNLIASAVDGQRRLIAVVMGADSAKGREEEAR 231
G GFN + D +R L+A+ + E+ AR
Sbjct: 227 GNGFNRVKEEFDHERFLVALTNYGTAMCAFEDAAR 261
Parameters:
H=1
V=500
B=250
-ctxfactor=1.00
E=10
Query ----- As Used ----- ----- Computed ----
Frame MatID Matrix name Lambda K H Lambda K H
+0 0 BLOSUM62 0.318 0.135 0.399 same same same
Query
Frame MatID Length Eff.Length E S W T X E2 S2
+0 0 347 347 10. 59 3 11 22 0.21 34
Statistics:
Query Expected Observed HSPs HSPs
Frame MatID High Score High Score Reportable Reported
+0 0 63 (28.9 bits) 1826 (836.5 bits) 42 42
Query Neighborhd Word Excluded Failed Successful Overlaps
Frame MatID Words Hits Hits Extensions Extensions Excluded
+0 0 9047 10061369 2135661 7914281 11427 5
Database: SwissProt version 30 Sat Oct 15 12:04:49 MET 1994
Release date: unknown
Posted date: 12:08 PM MET Oct 15, 1994
# of letters in database: 14,147,368
# of sequences in database: 40,292
# of database sequences satisfying E: 18
No. of states in DFA: 573 (56 KB)
Total size of DFA: 149 KB (192 KB)
Time to generate neighborhood: 0.10u 0.01s 0.11t Real: 00:00:00
Time to search database: 113.33u 3.65s 116.98t Real: 00:03:11
Total cpu time: 113.51u 3.80s 117.31t Real: 00:03:12
>up B. Histogram of Expectations
Shown in the output below is a histogram of the lowest (most
significant) Expect values obtained with each database
sequence. This information is useful in determining the
numbers of database sequences that achieved a particular
level of statistical significance. It indicates the number
of database matches that would be reportable at various set-
of database matches that would be reportable at various set-
tings for the expectation threshold (E parameter).
>up Searching periods
The "Searching..." indicator indicates progress that the
program made in searching the database. A complete database
search will yield 50 periods (.), or one period per database
sequence, whichever number is smaller. When searching a
database consisting of 50 sequences or more, if fewer than
50 periods are displayed and the program aborted for some
reason, dividing the number of periods by 0.5 will yield the
approximate percentage (0-100%) of the database that was
searched before the program died. If the program had diffi-
culty making progress through the database, one or more
asterisks (*) may be interspersed between the periods at
one-minute intervals.
>up One-line Summaries
The one-line sequence descriptions and summaries of results
are useful for identifying biologically interesting database
matches and correlating this interest with the statistical
significance estimates. Unless otherwise requested, the
database sequences are sorted by increasing P-value (proba-
bility). Identifiers for the database sequences appear in
the first column; then come brief descriptions of each
sequence, which may need to be truncated in order to fit in
the available space.
>up HSP score
The High Score column contains the score of the highest-scoring HSP
found with each database sequence. It should be noted that the
highest-scoring HSP whose score is reported in the "High Score"
column is not necessarily a member of the set of HSPs which yields the
lowest P-value; the highest-scoring HSP may be excluded from
this set on the basis of consistency rules governing the
grouping of HSPs (see the -consistency option).
>up P(N) column
The "P(N)" column contains the lowest P-value
ascribed to any set of HSPs for each database sequence.
The P-values are a function of N,
as used in Karlin-Altschul Sum statistics or
Poisson statistics, to treat situations where multiple HSPs
are found. It should be noted that the highest-scoring HSP
whose score is reported in the "High Score" column is not
necessarily a member of the set of HSPs which yields the
lowest P-value; the highest-scoring HSP may be excluded from
this set on the basis of consistency rules governing the
grouping of HSPs (see the -consistency option). Numbers of
the form "7.7e-160" are in scientific notation. In this
particular example, the number being represented is 7.7
times 10 to the minus 160th power, or 7.7 x 10 sup { -160
}, which is astronomically close to zero.
>up N column
The N column displays the number of HSPs in
the set which was ascribed the lowest P-value.
>up Alignments :
Alignments found with the BLAST algorithm are ungapped.
Several statistics are used to describe each HSP.
>up Score : For the segment pair display, the score is the sum
of the scoring matrix values in the segment pair
being displayed;
>up Bit : The raw score is converted to bits of information
by multiplying by lambda (see the Statistics output);
>up Expect : the number of times one might Expect to see such a match (or
a better one) merely by chance;
>up P or Sum P : The P-value (probability in
the range 0-1) of observing such a match;
>up Identities : The number and fraction of total residues
in the HSP which are identical;
>up Positives : The number and fraction of residues for
which the alignment scores have positive values.
When Sum statistics have been
used to calculate the Expect and P-values, the P-value is
qualified with the word "Sum" and the N parameter used in
the Sum statistics is provided in parentheses to indicate
the number of HSPs in the set; when Poisson statistics have
been used to calculate the Expect and P-values, the P-value
is qualified with the word "Poisson".
>up Between the two lines
Between the rwo lines
of Query and Subject (database) sequence is a line indicat-
ing the specific residues which are identical, as well as
those which are non-identical but nevertheless have positive
alignment scores defined in the scoring matrix that was used
(the BLOSUM62 matrix in this case). Identical letters or
residues, when paired with each other, are not highlighted
if their alignment score is negative or zero. Examples of
this would be an X juxtaposed with an X in two amino acid
sequences, or an N juxtaposed with another N in two nucleo-
tide sequences. Such ambiguous residue-residue pairings may
be uninformative and thus lend no support to the overall
alignment being either real or random; however, the informa-
tiveness of these pairings is left up to the user of the
BLAST programs to decide, because any values desired can be
specified in a scoring matrix of the user's own making.
Presented by Fredj Tekaia tekaia@pasteur.fr