[Bioperl-l] Bio::SeqIO::tinyseq

Wed Jan 28 02:58:02 EST 2004

Donald,

The best way to do this is to ignore the root level of the xml, use perl to 
parse entries out of it, and pass entry xml only to the parser. This keeps 
the memory usage down and you can parse as large file as you want.

I've done it using XML::Twig in BIo::Variation::IO::xml, but the same can be 
applied to PerlSAX.

sub next {
    my( $self ) = @_;

    local $/ = "</seqDiff>\n";
    return unless my $entry = $self->_readline;
    return unless $entry =~ /^\W*<seqDiff/;

    $seqdiff = Bio::Variation::SeqDiff->new;

    # create new parser object
    my $twig_handlers = {'seqDiff' =>  \&_seqDiff };
    my $t = new XML::Twig ( TwigHandlers => $twig_handlers,
                            KeepEncoding => 1 );
    $t->parse($entry);

    return $seqdiff;
}

	-Heikki

On Wednesday 28 Jan 2004 01:54, Donald Jackson wrote:
> Hi,
>
> I've uploaded Bio::SeqIO::tinyseq, a SeqIO module for parsing the NCBI
> TinySeq XML format to the bioperl cvs repository.  Currently, it only reads
> and writes 'fasta-like' information (accession, description, sequence,
> sequence type).  The DTD supports additional info (such as organism/TaxID)
> which I'm working on adding to formats that support this info.
>
> Please let me know if you have questions, suggestions, and especially
> bugs...
>
> Don Jackson
> BMS Bioinformatics
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________