[Bioperl-l] Problems parsing swiss-prot files

Jessica Dantzer jdantzer at cs.iupui.edu
Tue Jul 6 15:00:14 EDT 2004


We added both of the files to our current version of Bioperl, and things
seem to be working as they should.  Thanks for the help!

Jessica


> I've fixed it in CVS.  I also fixed a bunch of other things in swissprot
> parsing to make the parser cleaner I hope.   This involved improving the
> 'new' function in Bio::Annotation::Reference so you'd want to get that
> as well if you getting code from CVS.
>
> Multi-line RP lines are now all put into the rp field of the
> Annotation::Reference object.  The parser takes care of splitting it
> back into multi-line fields upon writing (although I didn't test this
> case specifically).
>
> PVH and our code auditors.  As happy as I am about the code audit for
> SeqIO and the like and making sure that things can roundtrip.  I really
> feel like the guts of these parsers could just a few weeks of someone's
> time to clean them up first.  Of course myself and few others would want
> to simplify the sequence/annotation/feature object model first so who
> knows what is the best starting point...
>
> -jason
>
> On Fri, 2 Jul 2004, Jessica Dantzer wrote:
>
>> Most of the references in most of the files have only one RP
>> line.  Occasionally, there are two.  I haven't seen more than two,
>> though.  One of the files that had more than one line in at least one
>> reference was for P33897.  I'm parsing information on the mutation/
>> variant data and their references, and so need some of the information
>> on those second lines.
>>
>> At 03:55 PM 7/2/2004, Jason Stajich wrote:
>> >Is there more than one RP line per reference?  The data structures
>> and parsers currently assume there is only one.
>> >can you send an acc so we can add it to the tests?
>> >
>> >-jason
>> >On Thu, 1 Jul 2004, Jessica Dantzer wrote:
>> >
>> > > I'm working on parsing swiss-prot files for use in another
>> database, and I've managed to work out where all the information I
>> need is stored for the most part.  The only problems I'm
>> encountering are with the reference parsing-- Some of the files
>> have multiple "RP" lines, and I only seem to be able to get one.
>> The code seems to indicate that this is how the files are parsed.
>> Is there any other way to access the second line?
>> > >
>> > > Thanks,
>> > > Jessica
>> > >
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > Bioperl-l mailing list
>> > > Bioperl-l at portal.open-bio.org
>> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>> > >
>> >
>> >--
>> >Jason Stajich
>> >Duke University
>> >jason at cgt.mc.duke.edu
>>
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list