[Biopython] [Biopython-dev] Upcoming NCBI BLAST XML2 format
Jan Kim
jttkim at googlemail.com
Fri May 8 10:58:01 UTC 2015
Dear All,
for what it's worth, I agree that spewing out multiple files is a really
bad idea in the context of scripted / automated processing using pipes.
As the Xinclude file is designed to be "used to generate a single XML document
that contains results from all the queries in a search", I specifically
would argue that there should be an option (command line switch etc.) for
BLAST to emit that single document, rather than a collection of files.
The coding overhead to provide that must be marginal, and it will save many
of us a substantially larger overhead resulting from generating temporary
directories and cleaning them up.
Best regards, Jan
On Thu, May 07, 2015 at 12:14:13AM +0000, Fields, Christopher J wrote:
> I agree, it???s worth asking NCBI about this. Now, whether we get an answer or not is another issue???
>
> chris
>
> > On May 6, 2015, at 3:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> >
> > Hi Travis,
> >
> > I've no idea what the rational is for this bit of the change
> > (other than the existing blast XML abuses the <iteration>
> > tag for multiple queries), but haven't yet tried looking at
> > the example output so I'm not panicking yet.
> >
> > However, we may want to lobby the NCBI about this...
> >
> > Peter
> >
> > On Wed, May 6, 2015 at 5:50 PM, Travis Wrightsman <twrig002 at ucr.edu> wrote:
> >> Peter,
> >>
> >> It seems that if support for the original single XML output for multiple
> >> queries is dropped then BioPython will need to either stitch together all
> >> the XML files using the base Xinclude file or iterate through all the files
> >> and concatenate them in an object.
> >>
> >> Does anyone know why NCBI is changing to a multi-file output instead of a
> >> single-file output that is easier to work with programmatically? There must
> >> be someone or some software suite benefiting from this change and it's not
> >> BioPython.
> >>
> >> Travis
> >>
> >> On Wed, May 6, 2015 at 7:49 AM, Peter Cock <p.j.a.cock at googlemail.com>
> >> wrote:
> >>>
> >>> On Wed, May 6, 2015 at 3:22 PM, Martin Mokrejs
> >>> <mmokrejs at fold.natur.cuni.cz> wrote:
> >>>> Hi,
> >>>> are you aware of new changes in BLAST's XML format? Time for feedback
> >>>> before it emerges. ;-)
> >>>>
> >>>> ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/xml2.pdf
> >>>>
> >>>> Martin
> >>>
> >>> Yes, but thanks for double checking:
> >>>
> >>> http://lists.open-bio.org/pipermail/biopython-dev/2015-May/020923.html
> >>>
> >>> I'm a little nervous about the idea that BLAST+ will not provide single
> >>> (large) XML files for multiple-query searches, and instead appears to
> >>> be going to produce one file per query and a manifest xinclude file.
> >>>
> >>> This sounds problematic for things like parsing via stdout.
> >>>
> >>> What have you noticed?
> >>>
> >>> Peter
> >>> _______________________________________________
> >>> Biopython-dev mailing list
> >>> Biopython-dev at mailman.open-bio.org
> >>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> >>
> >>
> > _______________________________________________
> > Biopython mailing list - Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
>
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
--
+- Jan T. Kim -------------------------------------------------------+
| email: jttkim at gmail.com |
| WWW: http://www.jtkim.dreamhosters.com/ |
*-----=< hierarchical systems are for files, not for humans >=-----*
More information about the Biopython
mailing list