[Bioperl-l] Getting refseq (contigs) from Genbank

Jason Eric Stajich jason@cgt.mc.duke.edu
Thu, 1 Nov 2001 13:35:33 -0500 (EST)


Any changes for this should happen either in the DB:: module or the
SeqIO::genbank/SeqIO::fasta parser - if we are just looking at <pre> tags
this could be handled in sequence parser rather than primarySeq (as much as I
don't want to add this type of stuff to the parser either it would be a
seemingly trivial addition). Do you think there is any other way to
intercept the data from LWP before sending it as a stream to the SeqIO
system.  I would like to still be able to preserve our streaming of data
rather than intercepting the who data as a string, processing that, then
sending that along as an IO::String or tempfile to the SeqIO system.

Anyone else have thoughts?


-jason
On Thu, 1 Nov 2001, Wiepert, Mathieu wrote:

> Jason,
>
> If I were to want to parse the sequence from the HTML, where would you do
> that.  Is it possible to make a sort of fuzzy validate_seq in PrimarySeq,
> such that if the validate fails, then do some sort of match to see if you
> can find a string that starts with <pre>>gi or something?  Chop out the pre
> sections and see if they have a fasta format?  Otherwise, how to avoid the
> error from PrimarySeq and see if it's still ok.
>
> I am guessing this is not the best way to go,  suggs?
>
> -Mat
>
>
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu