[Bioperl-l] SeqIO

Marc Logghe Marc.Logghe at ablynx.com
Thu Mar 6 16:22:10 UTC 2008


Hi Nick,
I don't think you should leave out the -format option. You have to leave
it in but the format should be provided by the B::T::GuessSeqFormat
object.
Something like:

#!/usr/bin/perl
use strict;
use Bio::SeqIO;
use Bio::Tools::GuessSeqFormat;

$| = 1;
my $number_of_files = @ARGV;
if(!$number_of_files){print "no files entered\n";exit:}
foreach my $file (@ARGV){
  my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
  my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
$guesser->guess);
  my $seq_object = $seqio_object->next_seq;
  my $sequence = $seq_object->seq;
  print "$sequence\n";
}

HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> Sent: donderdag 6 maart 2008 16:24
> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Here's the scoop:
> When I use Jason's suggestion, (-format => 'gcg'),
> My program works without complaint on the original file that looks
like:
> !!NA_SEQUENCE 1.0
>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..
> 
>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> et c.
> 
> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> (which should be retro-gcg format (before version 11?)),
> my program runs, but there IS a complaint:
> Use of uninitialized value in scalar chomp at
> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
1.
> BUT
> If I remove (-format => 'gcg'),  I get no complaint, but the sequence
> returned still has its numbers imbedded. This effects my calculations.
> 
> Thanks, at least i know what my options are.
> 
> 
> 
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:
> 
> >
> > Nick,
> >
> > This is the regex that Bio::Tools::GuessSeqFormat uses to identify a
gcg
> file:
> >
> > /Length: .*Type: .*Check: .*\.\.$/
> >
> > It is the second  line in GCG file. If first line matches to some
other
> format
> > regex, this will not not be evaluated.
> >
> > Let us know,
> >
> > -Heikki
> >
> > On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> >> Verily,
> >> One interpretation of the docs might be: will read any format if
the
> format
> >> is specified.
> >> I was hoping that I could write a program that one needn't specify
> format.
> >> It'd be more user-friendly and useful.
> >>
> >> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> >>> probably you should try specifying the format explicitly first- as
in
> >>> (-format => 'gcg')
> >>>
> >>> -j
> >>>
> >>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >>>> I thought GCG format changed somewhere along the way but I maybe
> >>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >>>> with an example file).
> >>>>
> >>>> Also, kind of odd that the sequence data wasn't checked...
> >>>>
> >>>> chris
> >>>>
> >>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>>>> So the Howto says that Bio::SeqIO will read almost any known
format
> >>>>> including GCG.
> >>>>> So I create a GCG file with Seqlab and try to printout its
> >>>>> sequence as a
> >>>>> string. ( I did guess at the way to get the sequence string:
> >>>>>
> >>>>> #!/usr/bin/perl -w
> >>>>> use strict;
> >>>>> $| = 1;
> >>>>> use Bio::SeqIO;
> >>>>> my $number_of_files = @ARGV;
> >>>>> if(!$number_of_files){print "no files entered\n";exit:}
> >>>>> foreach my $file (@ARGV){
> >>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>>>> my $seq_object = $seqio_object->next_seq;
> >>>>> my $sequence = $seq_object->seq;
> >>>>> print "$sequence\n";
> >>>>> my $status = &windowscore($sequence);
> >>>>> }
> >>>>>
> >>>>> But what it returned was the entire contents of the file with no
> >>>>> format
> >>>>> decoding. Have I been deluded?
> >>>>>
> >>>>> NewDNALength:810March5,200818:26Type:NCheck:
> >>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>>>>
> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>>>> CGAAGGT
> >>>>>
> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>>>> GGCTGCT
> >>>>>
> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>>>> GCAGAGC
> >>>>>
> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>>>> GCCAGCG
> >>>>>
> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>>>> TCCCCTG
> >>>>>
> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>>>> 51GGCAG
> >>>>>
> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>>>> AGACATC
> >>>>>
> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>>>> CCGCCC6
> >>>>>
> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>>>> TCATGCG
> >>>>>
> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>>>> CAGCCGC
> >>>>>
> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>>>> GGG
> >>>>>
> >>>>>
> >>>>>
> >>>>> Nick Staffa
> >>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>>>> Scientific Computing Support Group
> >>>>> NIEHS Information Technology Support Services Contract
> >>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>>>> National Institute of Environmental Health Sciences
> >>>>> National Institutes of Health
> >>>>> Research Triangle Park, North Carolina
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher
> >>>> Lab of Dr. Robert Switzer
> >>>> Dept of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list