[Bioperl-l] Is bio-perl right?
Ewan Birney
birney@ebi.ac.uk
Fri, 29 Sep 2000 08:21:03 +0100 (GMT)
On 28 Sep 2000, John S. J. Anderson wrote:
> Greetings --
>
> I'm trying to decide if bio-perl provides the tools I need to do
> something, or if I'm better off rolling my own custom solution.
>
> Basically, I want to retrieve a large number (hundreds) of sequence
> files from Genbank (via Entrez, so there are a number of potential
> formats) and then parse each file according to the header
> information. I need to split the sequence in each file into coding and
> non-coding, and I would like to map each segment back onto a genome
> (probably by tracking location relative to ORF starts and stops).
>
> I know there's been some traffic on the list recently about the
> difficulty of sufficiently generalizing the GenBank format via a
> bio-perl parser, but I haven't played around with the code at all. (To
> cross with another thread, the documentation (and lack of time) has
> been the biggest barrier to my picking up bio-perl.)
>
I think bioperl should get you 75% of the way there:
Picking up hundreas of sequence files - ok
Parsing GenBank (latest 0.6.2 candidate release is the one to go for)
ok.
Then the basic loop is going to go something like
# script for looping over genbank entries, printing out
# start-end of CDS exons
use Bio::SeqIO;
use Bio::Seq; # don't really need this, because Bio::SeqIO uses it
$seqio = Bio::SeqIO->new('-format' => 'GenBank', -fh =>
\*INPUT_STREAM);
while( $seq = $seqio->next_seq ) {
foreach $feat ( $seq->top_SeqFeatures ) {
if( $feat->primary_tag eq 'CDS_span' ) {
# features is a CDS line with a join statement
foreach $sub ( $feat->sub_SeqFeature ) {
print "start ",$sub->start," ",$sub->end,"\n";
# do what you like
}
} elsif ( $feat->primary_tag eq 'CDS' ) {
# feature is a CDS line without a join statement
# yes - this part is potentially badly designed in bioperl!
print "start ",$feature->start," end ",$feature->end,"\n";
}
}
}
> So, is bio-perl the Right Thing for this job, or should I look into
> developing my own stuff?
>
I would hope that Bioperl is "the right thing".
Give it a whirl and i'd be interested to hear about your experiences. Feel
free to edit the Wiki docs directly about your experiences as well at
http://bio.perl.org/wiki/html/BioPerl/FrontPage.html
(choose perhaps BioperlGettingStarted and then just click "edit page" and
you are away).
I will add this mini-script to the wiki docs myself... ;)
> Thanks for any advice,
> john.
>
> --
> ------------------------------------------------------------------------
> John S J Anderson NCBI,NLM,NIH
> IRTA Fellow Bldg 38A, B2N14
> janderso@ncbi.nlm.nih.gov 301.594.6087
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------