[Bioperl-l] Bio::SeqIO::tinyseq
Heikki Lehvaslaiho
heikki at nildram.co.uk
Wed Jan 28 02:58:02 EST 2004
Donald,
The best way to do this is to ignore the root level of the xml, use perl to
parse entries out of it, and pass entry xml only to the parser. This keeps
the memory usage down and you can parse as large file as you want.
I've done it using XML::Twig in BIo::Variation::IO::xml, but the same can be
applied to PerlSAX.
sub next {
my( $self ) = @_;
local $/ = "</seqDiff>\n";
return unless my $entry = $self->_readline;
return unless $entry =~ /^\W*<seqDiff/;
$seqdiff = Bio::Variation::SeqDiff->new;
# create new parser object
my $twig_handlers = {'seqDiff' => \&_seqDiff };
my $t = new XML::Twig ( TwigHandlers => $twig_handlers,
KeepEncoding => 1 );
$t->parse($entry);
return $seqdiff;
}
-Heikki
On Wednesday 28 Jan 2004 01:54, Donald Jackson wrote:
> Hi,
>
> I've uploaded Bio::SeqIO::tinyseq, a SeqIO module for parsing the NCBI
> TinySeq XML format to the bioperl cvs repository. Currently, it only reads
> and writes 'fasta-like' information (accession, description, sequence,
> sequence type). The DTD supports additional info (such as organism/TaxID)
> which I'm working on adding to formats that support this info.
>
> Please let me know if you have questions, suggestions, and especially
> bugs...
>
> Don Jackson
> BMS Bioinformatics
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list