[Bioperl-l] Bio::SeqIO issue

Kevin Brown Kevin.M.Brown at asu.edu
Wed Aug 5 21:45:03 UTC 2009


I'm not sure, but I think the module is fasta, not Fasta. So it should
be -format=>'fasta', unless you're on a case-insensitive system that is
forgiving the capital...

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Wednesday, August 05, 2009 2:38 PM
> To: Hilgert, Uwe
> Cc: BioPerl List
> Subject: Re: [Bioperl-l] Bio::SeqIO issue
> 
> Uwe,
> 
> Please keep replies on the list.
> 
> It's very possible that's the issue; IIRC the fasta parser pulls out  
> the full sequence in chunks (based on local $/ = "\n>") and 
> splits the  
> header off as the first line in that chunk.  You could probably try  
> leaving the format out and letting SeqIO guess it, or passing 
> the file  
> into Bio::Tools::GuessSeqFormat directly, but it's probably 
> better to  
> go through the files and add a file extension that 
> corresponds to the  
> format.
> 
> chris
> 
> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote:
> 
> > Thanks, Chris. The files have no extension, but we indicate what  
> > format
> > to use, like in the manual:
> >
> > $in  = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta');
> >
> > I wonder now whether this could exactly cause the problem: as we are
> > telling that input files are in fasta format they are being 
> treated as
> > such (=remove first line) - regardless of whether they really are  
> > fasta?
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > Uwe Hilgert, Ph.D.
> > Dolan DNA Learning Center
> > Cold Spring Harbor Laboratory
> >
> > C: (516) 857-1693
> > V: (516) 367-5185
> > E: hilgert at cshl.edu
> > F: (516) 367-5182
> > W: http://www.dnalc.org
> >
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Wednesday, August 05, 2009 5:04 PM
> > To: Hilgert, Uwe
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Bio::SeqIO issue
> >
> > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
> >
> >> Is my impression correct that Bio::SeqIO just assumes that 
> sequences
> >> are
> >> being submitted in FASTA format?
> >
> > No. See:
> >
> > http://www.bioperl.org/wiki/HOWTO:SeqIO
> >
> > SeqIO tries to guess at the format using the file extension, and if
> > one isn't present makes use of Bio::Tools::GuessSeqFormat.  It's
> > possible that the extension is causing the problem, or that
> > GuessSeqFormat guessing wrong (it's apt to do that, as it's 
> forced to
> > guessing).  In any case, it's always advisable to 
> explicitly indicate
> > the format when possible.
> >
> > Relevant lines:
> >
> >    return 'fasta'   if 
> /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ 
> > i;
> > ...
> >    return 'raw'     if /\.(txt)$/i;
> >
> >> In our experience, implementing
> >> Bio::SeqIO led to the first line of files being cut off, 
> regardless  
> >> of
> >> whether the files were indeed fasta files or files that only  
> >> contained
> >> sequence.
> >
> > Files that only contain sequence are 'raw'.  Ones in FASTA are  
> > 'fasta'.
> >
> >> Which, in the latter, led to sequence submissions that had the
> >> first line of nucleotides removed. Has anyone tried to 
> write a fix  
> >> for
> >> this?
> >
> > This sounds like a bug, but we have very little to go on beyond your
> > description.  What version of bioperl are you using, OS, etc?  What
> > does your data look like?  File extension?
> >
> > chris
> >
> >> Thanks,
> >>
> >> Uwe
> >>
> >>
> >>
> >>
> >>
> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >>
> >> Uwe Hilgert, Ph.D.
> >>
> >> Dolan DNA Learning Center
> >>
> >> Cold Spring Harbor Laboratory
> >>
> >>
> >>
> >> V: (516) 367-5185
> >>
> >> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
> >>
> >> F: (516) 367-5182
> >>
> >> W: http://www.dnalc.org
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




More information about the Bioperl-l mailing list