[Bioperl-l] new directions
Geer, Lewis (NLM/NCBI)
lewisg@mail.nih.gov
Wed, 7 Mar 2001 12:00:13 -0500
Hi,
Just in case you haven't seen it, XML output is an option for the NCBI
public blast servers (it's been an option in standalone blast for a while).
Here's a sample:
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
"NCBI_BlastOutput.dtd"><BlastOutput>
<BlastOutput_program>blastp</BlastOutput_program>
<BlastOutput_version>blastp 2.1.2 [Nov-13-2000]</BlastOutput_version>
<BlastOutput_reference>~Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Schaffer, ~Jinghui Zhang, Zheng Zhang, Webb Miller, and David
J. Lipman (1997), ~"Gapped BLAST and PSI-BLAST: a new generation of
protein database search~programs", Nucleic Acids Res.
25:3389-3402.</BlastOutput_reference>
<BlastOutput_db>nr</BlastOutput_db>
<BlastOutput_query-ID>lcl|1_20397</BlastOutput_query-ID>
<BlastOutput_query-def>gi|7291680|gb|AAF47102.1| </BlastOutput_query-def>
<BlastOutput_query-len>1020</BlastOutput_query-len>
<BlastOutput_param>
<Parameters>
<Parameters_matrix>BLOSUM62</Parameters_matrix>
<Parameters_expect>10</Parameters_expect>
<Parameters_include>0</Parameters_include>
<Parameters_sc-match>0</Parameters_sc-match>
<Parameters_sc-mismatch>0</Parameters_sc-mismatch>
<Parameters_gap-open>11</Parameters_gap-open>
<Parameters_gap-extend>1</Parameters_gap-extend>
<Parameters_filter>L;</Parameters_filter>
</Parameters>
</BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
<Iteration_iter-num>1</Iteration_iter-num>
<Iteration_hits>
<Hit>
<Hit_num>1</Hit_num>
<Hit_id>gi|280603|pir||A36691</Hit_id>
<Hit_def>Ca2+-transporting ATPase (EC 3.6.1.38), sarcoplasmic
reticulum - fruit fly (Drosophila melanogaster) >gi|158416|gb|AAB00735.1|
(M62892) sarco/endoplasmic reticulum-type Ca-2+-ATPase [Drosophila
melanogaster]</Hit_def>
<Hit_accession>A36691</Hit_accession>
<Hit_len>1002</Hit_len>
<Hit_hsps>
<Hsp>
<Hsp_num>1</Hsp_num>
<Hsp_bit-score>1792.44</Hsp_bit-score>
<Hsp_score>4642</Hsp_score>
<Hsp_evalue>0</Hsp_evalue>
<Hsp_query-from>1</Hsp_query-from>
<Hsp_query-to>993</Hsp_query-to>
<Hsp_hit-from>1</Hsp_hit-from>
<Hsp_hit-to>993</Hsp_hit-to>
<Hsp_pattern-from>0</Hsp_pattern-from>
<Hsp_pattern-to>0</Hsp_pattern-to>
<Hsp_query-frame>1</Hsp_query-frame>
<Hsp_hit-frame>1</Hsp_hit-frame>
<Hsp_identity>992</Hsp_identity>
<Hsp_positive>993</Hsp_positive>
<Hsp_gaps>0</Hsp_gaps>
<Hsp_align-len>993</Hsp_align-len>
<Hsp_density>0</Hsp_density>
<Hsp_qseq>MEDGHSKTVEQSLNFFGTDPERGLTLDQIKANQKKYGPNELPTEEGKSIWQLVLEQFDDLLVKILL
LAAIISFVLALFEEHEETFTAFVEPLVILLILIANAVVGVWQERNAESAIEALKEYEPEMGKVVRQDKSGIQKVRA
KEIVPGDLVEVSVGDKIPADIRITHIYSTTLRIDQSILTGESVSVIKHTDAIPDPRAVNQDKKNILFSGTNVAAGK
ARGVVIGTGLSTAIGKIRTEMSETEEIKTPLQQKLDEFGEQLSKVISVICVAVWAINIGHFNDPAHGGSWIKGAIY
YFKIAVALAVAAIPEGLPAVITTCLALGTRRMAKKNAIVRSLPSVETLGCTSVICSDKTGTLTTNQMSVSRMFIFD
KVEGNDSSFLEFEMTGSTYEPIGEVFLNGQRIKAADYDTLQELSTICIMCNDSAIDYNEFKQAFEKVGEATETALI
VLAEKLNSFSVNKSGLDRRSAAIACRGEIETKWKKEFTLEFSRDRKSMSSYCTPLKASRLGTGPKLFVKGAPEGVL
ERCTHARVGTTKVPLTSALKAKILALTGQYGTGRDTLRCLALAVADSPMKPDEMDLGDSTKFYQYEVNLTFVGVVG
MLDPPRKEVFDSIVRCRAAGIRVIVITGDNKATAEAICRRIGVFAEDEDTTGKSYSGREFDDLSPTEQKAAVARSR
LFSRVEPQHKSKIVEFLQSMNEISAMTGDGVNDAPALKKAEIGIAMGSGTAVAKSAAEMVLADDNFSSIVSAVEEG
RAIYNNMKQFIRYLISSNIGEVVSIFLTAALGLPEALIPVQLLWVNLVTDGLPATALGFNPPDLDIMEKPPRKADE
GLISGWLFFRYMAIGFYVGAATVGAAAWWFVFSDEGPKLSYWQLTHHLSCLGGGDEFKGVDCKIFSDPHAMTMALS
VLVTIEMLNAMNSLSENQSLITMPPWCNLWLIGSMALSFTLHFVILYVDVLSTVFQVTPLSAEEWITVMKFSIPVV
LLDETLKFVARKIAD</Hsp_qseq>
<Hsp_hseq>MEDGHSKTVEQSLNFFGTDPERGLTLDQIKANQKKYGPNELPTEEGKSIWQLVLEQFDDLLVKILL
LAAIISFVLALFEEHEETFTAFVEPLVILLILIANAVVGVWQERNAESAIEALKEYEPEMGKVVRQDKSGIQKVRA
KEIVPGDLVEVSVGDKIPADIRITHIYSTTLRIDQSILTGESVSVIKHTDAIPDPRAVNQDKKNILFSGTNVAAGK
ARGVVIGTGLSTAIGKIRTEMSETEEIKTPLQQKLDEFGEQLSKVISVICVAVWAINIGHFNDPAHGGSWIKGAIY
YFKIAVAVAVAAIPEGLPAVITTCLALGTRRMAKKNAIVRSLPSVETLGCTSVICSDKTGTLTTNQMSVSRMFIFD
KVEGNDSSFLEFEMTGSTYEPIGEVFLNGQRIKAADYDTLQELSTICIMCNDSAIDYNEFKQAFEKVGEATETALI
VLAEKLNSFSVNKSGLDRRSAAIACRGEIETKWKKEFTLEFSRDRKSMSSYCTPLKASRLGTGPKLFVKGAPEGVL
ERCTHARVGTTKVPLTSALKAKILALTGQYGTGRDTLRCLALAVADSPMKPDEMDLGDSTKFYQYEVNLTFVGVVG
MLDPPRKEVFDSIVRCRAAGIRVIVITGDNKATAEAICRRIGVFAEDEDTTGKSYSGREFDDLSPTEQKAAVARSR
LFSRVEPQHKSKIVEFLQSMNEISAMTGDGVNDAPALKKAEIGIAMGSGTAVAKSAAEMVLADDNFSSIVSAVEEG
RAIYNNMKQFIRYLISSNIGEVVSIFLTAALGLPEALIPVQLLWVNLVTDGLPATALGFNPPDLDIMEKPPRKADE
GLISGWLFFRYMAIGFYVGAATVGAAAWWFVFSDEGPKLSYWQLTHHLSCLGGGDEFKGVDCKIFSDPHAMTMALS
VLVTIEMLNAMNSLSENQSLITMPPWCNLWLIGSMALSFTLHFVILYVDVLSTVFQVTPLSAEEWITVMKFSIPVV
LLDETLKFVARKIAD</Hsp_hseq>
<Hsp_midline>MEDGHSKTVEQSLNFFGTDPERGLTLDQIKANQKKYGPNELPTEEGKSIWQLVLEQFDDLLVK
ILLLAAIISFVLALFEEHEETFTAFVEPLVILLILIANAVVGVWQERNAESAIEALKEYEPEMGKVVRQDKSGIQK
VRAKEIVPGDLVEVSVGDKIPADIRITHIYSTTLRIDQSILTGESVSVIKHTDAIPDPRAVNQDKKNILFSGTNVA
AGKARGVVIGTGLSTAIGKIRTEMSETEEIKTPLQQKLDEFGEQLSKVISVICVAVWAINIGHFNDPAHGGSWIKG
AIYYFKIAVA+AVAAIPEGLPAVITTCLALGTRRMAKKNAIVRSLPSVETLGCTSVICSDKTGTLTTNQMSVSRMF
IFDKVEGNDSSFLEFEMTGSTYEPIGEVFLNGQRIKAADYDTLQELSTICIMCNDSAIDYNEFKQAFEKVGEATET
ALIVLAEKLNSFSVNKSGLDRRSAAIACRGEIETKWKKEFTLEFSRDRKSMSSYCTPLKASRLGTGPKLFVKGAPE
GVLERCTHARVGTTKVPLTSALKAKILALTGQYGTGRDTLRCLALAVADSPMKPDEMDLGDSTKFYQYEVNLTFVG
VVGMLDPPRKEVFDSIVRCRAAGIRVIVITGDNKATAEAICRRIGVFAEDEDTTGKSYSGREFDDLSPTEQKAAVA
RSRLFSRVEPQHKSKIVEFLQSMNEISAMTGDGVNDAPALKKAEIGIAMGSGTAVAKSAAEMVLADDNFSSIVSAV
EEGRAIYNNMKQFIRYLISSNIGEVVSIFLTAALGLPEALIPVQLLWVNLVTDGLPATALGFNPPDLDIMEKPPRK
ADEGLISGWLFFRYMAIGFYVGAATVGAAAWWFVFSDEGPKLSYWQLTHHLSCLGGGDEFKGVDCKIFSDPHAMTM
ALSVLVTIEMLNAMNSLSENQSLITMPPWCNLWLIGSMALSFTLHFVILYVDVLSTVFQVTPLSAEEWITVMKFSI
PVVLLDETLKFVARKIAD</Hsp_midline>
</Hsp>
</Hit_hsps>
</Hit>
[...]
</Iteration_hits>
<Iteration_stat>
<Statistics>
<Statistics_db-num>807597</Statistics_db-num>
<Statistics_db-len>-1431139411</Statistics_db-len>
<Statistics_hsp-len>0</Statistics_hsp-len>
<Statistics_eff-space>0</Statistics_eff-space>
<Statistics_kappa>0.041</Statistics_kappa>
<Statistics_lambda>0.267</Statistics_lambda>
<Statistics_entropy>4.94066e-324</Statistics_entropy>
</Statistics>
</Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>
> -----Original Message-----
> From: Jason Stajich [mailto:jason@chg.mc.duke.edu]
> Sent: Wednesday, March 07, 2001 11:45 AM
> To: Bioperl
> Subject: [Bioperl-l] new directions
>
>
> So very happy to have 0.7 out. I know there are some minor
> issues that
> have begun to be resolved, once these reach a suitable number
> or enough
> time has passed, we can think about a point release. Not for
> at least 3
> weeks though.
>
> The branching gives us a chance to take stock and look at
> where we want to
> go next. Interest has been expressed in expanding outside of
> the sequence
> analysis realm bioperl has pretty much occupied. I'm all for
> it. The new
> projects I hint at below should go on the main trunk, only
> bug fixes, and
> minor feature changes should go on the branch. We're probably
> flexible here so when in doubt we can discuss on the list.
>
> I'd like to throw some ideas out there and encourage people
> on the list
> who maybe haven't felt comfortable jumping in while we were
> churning on
> the release to think about picking up a project. Especially if any of
> these (or your own project ideas) scratch a particular itch
> you have.
> Some of these don't have to be part of bioperl-live but can
> be sattelite
> projects which utilize the bioperl core objects.
>
> These are just some ideas I have bouncing around, perhaps you
> have your
> own ideas and would like to contribute:
>
> This is also in wiki at
> http://www.bioperl.org/wiki/html/BioPerl/BioperlProjects.html - so any
> critiques or additions could be added there as well, just CC
> the list so
> we know to check.
>
> o perl is not an ideal language for doing something like
> huge microarray
> clustering, but it is ideal for dealing with formatting issues.
> Perhaps code that can deal with converting different
> microarray formats
> would be helpful.
>
> o Expansion into other expression data, code to help link
> expression data
> for genes (sometimes unknown genes) to available
> information in IGI,
> NCBI Unigene, etc. All in software so that it can be automated.
>
> o The Blast issues. I think the pluggable features to
> BPlite would be
> ideal, I don't know how well it will work ( wanting to
> parse more or
> less of the report -- runtime plugging of 'adaptors'?) . I like the
> html features of Bio::Tools::Blast. What about parsing
> NCBI Blast XML?
>
> o Fasta parsing. We should find a way to support this, either with a
> formal grammar or just some perl code.
>
> o Speaking of grammars, what about a grammar for parsing
> EMBL/Genbank?
> Would this be more/less efficient? We seem kind of kludgy
> in parts of
> the feature table parsing and it has gotten pretty heavy
> down there,
> are there ways to simplify this code?
>
> o Bio::Index::Blast which can read fetch ( and store?) seqs
> from a blast
> index.
>
> o Map data - genetic, RH maps and their markers. Adopting code for
> manipulating this information. A simple ePCR parser would
> fit in here
> too.
>
> o visualization - perhaps visualization is best done in java, but the
> bioperl-gui modules provide a nice way to look at a sequence with
> annotation. Is there interest in a png/gif/ps renderer as well,
> adopting existing code -- perhaps something similar to gff2ps.
>
> o Tree drawing - plugging into a PHYLIP or something similar
> to provide
> some nice drawings of phylogenetic tress.
>
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.duke.edu/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>