[Bioperl-l] Bio::SeqIO HOWTO

Hilmar Lapp hlapp at gnf.org
Thu Nov 3 01:48:02 EST 2005


$seq->display_id will give you the full composite ID after the  
greater-than character.

It's trivial enough to split it with a regular expression to obtain  
only the part you're interested in, so for the reasons Barry mentions  
Bioperl doesn't do this for you.


On Nov 2, 2005, at 8:25 PM, Barry Moore wrote:

> Li-
>
> The script is working correctly.  You are giving it a fasta file and
> then asking it to print the accession number.  While you and I can
> plainly see that the accession number NM_021308.1 is in the fasta
> header, bioperl makes no attempt to parse accession numbers from a  
> fasta
> header.  The reason for this is there is no uniformity in how fasta
> headers are written, so every fasta file could use a different header
> format and be valid.
>
> If you just want to see the script work correctly for learning  
> purposes,
> change the line:
> print $seq->accession_number,"\n";
> to this any or all of these lines:
> print $seq->alphabet,"\n";
> print $seq->description,"\n";
> print $seq->display_name,"\n";
> print $seq->length,"\n";
> print $seq->seq,"\n";
>
> If you want the script to print the accession number, try downloading
> the full GenBank formatted sequence and run your script something like:
> perl getaccs.pl mouse.gb genbank
>
> Barry
>
>> -----Original Message-----
>> From: chen li [mailto:chen_li3 at yahoo.com]
>> Sent: Wednesday, November 02, 2005 8:36 PM
>> To: Barry Moore
>> Subject: RE: [Bioperl-l] Bio::SeqIO HOWTO
>>
>> Barry,
>>
>> Thank you very much.
>>
>> Here are the results. 1) If I type "perl getaccs.pl" I
>> get this result "getaccs.pl File format" on the
>> screen. 2)If I type "perl getaccs.pl mouse.fasta
>> fasta" I get "unknow" on the screen. IT seems there
>> are no access no. printed out after the script is
>> executed.
>>
>> So what is the problem here?
>>
>> Li
>>
>> here is part of my file:
>>
>>> gi|10946609|ref|NM_021308.1| Mus musculus piwi like
>> homolog 2 (Drosophila) (Piwil2), mRNA
>> AGTGTGTGGGAGGAACGCAGGGGCTGGAATAGGAGGGAAAGGAGGTGGCTCCAGGAGAGAGCGAGAGAGG
>>
> GAGCGCTCGCATCGGGGCTCAGTGGCACCAGACCTAAAAAGAAATCTAGGCAAGGCTCCGGCACAGTCCA. 
> .
> ..
>> ....
>>
>> --- Barry Moore <bmoore at genetics.utah.edu> wrote:
>>
>>> Li-
>>>
>>> You don't need to modify the script.  It is written
>>> to accept the
>>> filename and format on the command line like this:
>>> perl getaccs.pl
>>> mouse.fasta fasta.
>>>
>>> Barry
>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at portal.open-bio.org
>>> [mailto:bioperl-l-
>>>> bounces at portal.open-bio.org] On Behalf Of chen li
>>>> Sent: Tuesday, November 01, 2005 10:30 PM
>>>> To: bioperl-l at bioperl.org
>>>> Subject: [Bioperl-l] Bio::SeqIO HOWTO
>>>>
>>>> Hi folks,
>>>>
>>>>  Here is one script copied from the Bio::SeqIO
>>> HOWTO:
>>>>
>>>>      use Bio::SeqIO;
>>>>      my $usage = "getaccs.pl file format\n";
>>>>      my $file = shift or die $usage;
>>>>      my $format = shift or die $usage;
>>>>
>>>>      my $inseq = Bio::SeqIO->new('-file'  =>
>>> "<$file",
>>>>               '-format' => $format );
>>>>      while (my $seq = $inseq->next_seq) {
>>>>            print $seq->accession_number,"\n";
>>>>      }
>>>>      exit;
>>>>
>>>>
>>>> I have a small file called mouse.fasta kept in the
>>>> same directory. My question is that  how does the
>>>> script know to read in mouse.fasta? Where should I
>>>> make a small modification in the script?
>>>>
>>>> Thanks,
>>>>
>>>> Li
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> __________________________________
>>>> Yahoo! FareChase: Search multiple travel sites in
>>> one click.
>>>> http://farechase.yahoo.com
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at portal.open-bio.org
>>>>
>>>
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>>
>>
>> __________________________________
>> Yahoo! Mail - PC Magazine Editors' Choice 2005
>> http://mail.yahoo.com
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list