[Bioperl-l] guessing sequence format
Heikki Lehvaslaiho
heikki at nildram.co.uk
Tue Dec 2 15:25:37 EST 2003
Andreas Kähäri has written a module that gives SeqIO and AlignIO ability to
look into input files and guess the format of the sequence:
Bio::Tools::GuessSeqFormat. See the POD docs in the module for formats and
details.
Initial modifications to Bio::SeqIO::new() and Bio::AlignIO::new() to try to
determine the format in this order:
1. given in argument (-format)
2. based on the file name extension
3. looking into file by calling Bio::Tools::GuessSeqFormat
No verification of the format is done if conditions 1 or 2 are met. I think it
would be neat to have an option to do that. It could, for example, be linked
to verbosity. Suggestions or implementations are welcome.
Tests have been written for reading all formats from files and even reading
from a file handle works which is really cool:
----------------- snip --------------------
use IO::String;
use Bio::SeqIO;
my $string = ">test1 no comment
agtgctagctagctagctagct
>test2 no comment
gtagttatgc
";
my $stringfh = new IO::String($string);
my $seqio = new Bio::SeqIO(-fh => $stringfh);
while( my $seq = $seqio->next_seq ) {
print $seq->id, "\n";
}
----------------- snip --------------------
It would really good if people could try this out now so thatthe possible bugs
could be ironed out before the 1.4.
-Heikki
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list