[Bioperl-l] Problems parsing swiss-prot files
Jason Stajich
jason at cgt.duhs.duke.edu
Fri Jul 2 20:49:02 EDT 2004
I've fixed it in CVS. I also fixed a bunch of other things in swissprot
parsing to make the parser cleaner I hope. This involved improving the
'new' function in Bio::Annotation::Reference so you'd want to get that as
well if you getting code from CVS.
Multi-line RP lines are now all put into the rp field of the
Annotation::Reference object. The parser takes care of splitting it back
into multi-line fields upon writing (although I didn't test this case
specifically).
PVH and our code auditors. As happy as I am about the code audit for
SeqIO and the like and making sure that things can roundtrip. I really
feel like the guts of these parsers could just a few weeks of someone's
time to clean them up first. Of course myself and few others would want
to simplify the sequence/annotation/feature object model first so who
knows what is the best starting point...
-jason
On Fri, 2 Jul 2004, Jessica Dantzer wrote:
> Most of the references in most of the files have only one RP
> line. Occasionally, there are two. I haven't seen more than two,
> though. One of the files that had more than one line in at least one
> reference was for P33897. I'm parsing information on the mutation/ variant
> data and their references, and so need some of the information on those
> second lines.
>
> At 03:55 PM 7/2/2004, Jason Stajich wrote:
> >Is there more than one RP line per reference? The data structures and
> >parsers currently assume there is only one.
> >can you send an acc so we can add it to the tests?
> >
> >-jason
> >On Thu, 1 Jul 2004, Jessica Dantzer wrote:
> >
> > > I'm working on parsing swiss-prot files for use in another database, and
> > > I've managed to work out where all the information I need is stored for
> > > the most part. The only problems I'm encountering are with the reference
> > > parsing-- Some of the files have multiple "RP" lines, and I only seem to
> > > be able to get one. The code seems to indicate that this is how the files
> > > are parsed. Is there any other way to access the second line?
> > >
> > > Thanks,
> > > Jessica
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list