[Bioperl-l] Bio::SeqIO HOWTO

Barry Moore bmoore at genetics.utah.edu
Wed Nov 2 23:25:29 EST 2005


Li-

The script is working correctly.  You are giving it a fasta file and
then asking it to print the accession number.  While you and I can
plainly see that the accession number NM_021308.1 is in the fasta
header, bioperl makes no attempt to parse accession numbers from a fasta
header.  The reason for this is there is no uniformity in how fasta
headers are written, so every fasta file could use a different header
format and be valid.

If you just want to see the script work correctly for learning purposes,
change the line:
print $seq->accession_number,"\n";
to this any or all of these lines:
print $seq->alphabet,"\n";
print $seq->description,"\n";
print $seq->display_name,"\n";
print $seq->length,"\n";
print $seq->seq,"\n";

If you want the script to print the accession number, try downloading
the full GenBank formatted sequence and run your script something like:
perl getaccs.pl mouse.gb genbank

Barry

> -----Original Message-----
> From: chen li [mailto:chen_li3 at yahoo.com]
> Sent: Wednesday, November 02, 2005 8:36 PM
> To: Barry Moore
> Subject: RE: [Bioperl-l] Bio::SeqIO HOWTO
> 
> Barry,
> 
> Thank you very much.
> 
> Here are the results. 1) If I type "perl getaccs.pl" I
> get this result "getaccs.pl File format" on the
> screen. 2)If I type "perl getaccs.pl mouse.fasta
> fasta" I get "unknow" on the screen. IT seems there
> are no access no. printed out after the script is
> executed.
> 
> So what is the problem here?
> 
> Li
> 
> here is part of my file:
> 
> >gi|10946609|ref|NM_021308.1| Mus musculus piwi like
> homolog 2 (Drosophila) (Piwil2), mRNA
> AGTGTGTGGGAGGAACGCAGGGGCTGGAATAGGAGGGAAAGGAGGTGGCTCCAGGAGAGAGCGAGAGAGG
>
GAGCGCTCGCATCGGGGCTCAGTGGCACCAGACCTAAAAAGAAATCTAGGCAAGGCTCCGGCACAGTCCA..
..
> ....
> 
> --- Barry Moore <bmoore at genetics.utah.edu> wrote:
> 
> > Li-
> >
> > You don't need to modify the script.  It is written
> > to accept the
> > filename and format on the command line like this:
> > perl getaccs.pl
> > mouse.fasta fasta.
> >
> > Barry
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at portal.open-bio.org] On Behalf Of chen li
> > > Sent: Tuesday, November 01, 2005 10:30 PM
> > > To: bioperl-l at bioperl.org
> > > Subject: [Bioperl-l] Bio::SeqIO HOWTO
> > >
> > > Hi folks,
> > >
> > >  Here is one script copied from the Bio::SeqIO
> > HOWTO:
> > >
> > >      use Bio::SeqIO;
> > >      my $usage = "getaccs.pl file format\n";
> > >      my $file = shift or die $usage;
> > >      my $format = shift or die $usage;
> > >
> > >      my $inseq = Bio::SeqIO->new('-file'  =>
> > "<$file",
> > >               '-format' => $format );
> > >      while (my $seq = $inseq->next_seq) {
> > >            print $seq->accession_number,"\n";
> > >      }
> > >      exit;
> > >
> > >
> > > I have a small file called mouse.fasta kept in the
> > > same directory. My question is that  how does the
> > > script know to read in mouse.fasta? Where should I
> > > make a small modification in the script?
> > >
> > > Thanks,
> > >
> > > Li
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > __________________________________
> > > Yahoo! FareChase: Search multiple travel sites in
> > one click.
> > > http://farechase.yahoo.com
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > >
> >
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> 
> 
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com



More information about the Bioperl-l mailing list