[Bioperl-l] getting pubmed id from genbank files
Nathan Haigh
n.haigh at sheffield.ac.uk
Thu Jul 28 07:36:56 EDT 2005
Big Oops!
I wasn't using bioperl live! Things now seem to be ok - well at lest with
that one genbank file!
Thanks for the input anyway! :o)
Nathan
-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Nathan Haigh
Sent: 27 July 2005 09:10
To: Barry Moore
Cc: Hilmar Lapp; bioperl-l
Subject: RE: [Bioperl-l] getting pubmed id from genbank files
Yeah, i'm pretty sure i was using bioperl-live updated that morning. Your
explaination of the problem seems feasible from what i was looking at in the
perl debugger. I'll look into this a bit more later this morning.
Nathan
Quoting Barry Moore <bmoore at genetics.utah.edu>:
> Nathan-
>
> That sounds like you are using bioperl 1.4? The error is in
> Bio/SeqIO/genbank.pm and was fixed by Jason in cvs version 1.102 of
> that file. However the current code still looks a bit odd to me.
> Starting at line 1068 of the current cvs version (1.119) of genebank.pm
> we have:
>
> 1068 if (/^\s{2}JOURNAL\s+(.*)/o) {
> 1069 push(@loc, $1);
> 1070 while ( defined($_ = $self->_readline) ) {
> 1071 # we only match when there are at least 4 spaces
> 1072 # there is probably a better way to match this
> 1073 # as it assumes that the describing tag is short enough
> 1074 /^\s{4,}(.*)/o && do { push(@loc, $1);
> 1075 next;
> 1076 };
> 1077 last;
> 1078 }
> 1079 $ref->location(join(' ', @loc));
>
> This is all dealing with parsing the Journal line which is handled fine
> by lines 1068-69. The while loop at 1070 looks at successive lines to
> find something to add to the Journal line. The regex at line 1074 used
> to read /^\s{3,}(.*)/o which would not match if the next line after
> JOURNAL began with ' MEDLINE', but would match ' PUBMED' (Nathan's
> situation) causing that line to be added to the JOURNAL line. Is there
> ever a JOURNAL entry with more than one line? If so, shouldn't the
> following lines always be untagged and thus indented 12 making the regex
> /^\s{12}(.*)/o safer. The current situation would add any line to
> JOURNAL line if it's tag is shorter than 6 characters, and I don't think
> that's what we want.
>
> Barry
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Hilmar Lapp
> Sent: Tuesday, July 26, 2005 11:05 AM
> To: n.haigh at sheffield.ac.uk
> Cc: 'bioperl-l'
> Subject: Re: [Bioperl-l] getting pubmed id from genbank files
>
>
> On Jul 26, 2005, at 7:49 AM, Nathan Haigh wrote:
>
> > -- snip --
> > $VAR1 = bless( {
> > 'authors' => 'Clauss,M.J. and Mitchell-Olds,T.',
> > 'location' => 'Genetics 166 (3), 1419-1436 (2004) PUBMED
> > 15082560',
> > 'title' => 'Functional divergence in tandemly duplicated
> > Arabidopsis
> > thaliana trypsin inhibitor genes',
> > 'tagname' => 'reference'
> > }, 'Bio::Annotation::Reference' );
> > -- snip --
>
> This is odd. The PUBMED line should not be concatenated with the
> JOURNAL line. I wonder where this happens and why. Can you download the
> record from NCBI (using the web interface, format 'GenBank', 'Send all
> to file') and then parse it with Bio::SeqIO? If it works then the
> problem must be in the code that deals with the HTTP-response.
>
> -hilmar
>
>
> >
> > -----Original Message-----
> > From: Jason Stajich [mailto:jason.stajich at duke.edu]
> > Sent: 26 July 2005 15:28
> > To: Bioperl-l at portal.open-bio.org
> > Cc: Nathan Haigh
> > Subject: [Bioperl-l] getting pubmed id from genbank files
> >
> >
> >
> > Here is part of the synopsis in Bio::Seq:
> >
> > foreach my $ref ( $ann->get_Annotations('reference') ) {
> > print "Reference ",$ref->title,"\n";
> > }
> >
> > so do $ref->pubmed instead of $ref->title.
> >
> >
> > -jason
> >> On Jul 26, 2005, at 6:02 AM, Nathan Haigh wrote:
> >>
> >>> I want to be able to supply a list of GI's, retrieve the genbank
> >>> files and
> >>> parse out the pubmed id's.
> >>>
> >>>
> >>>
> >>> I know I can do the first steps of retrieving the genbank files
> >>> directly,
> >>> but how do I get the pubmed id's? I've been playing around with
> >>> things and
> >>> haven't yet found out if this can be done.
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Nathan
> >>>
> >>>
> >>>
> >>> ----------------------------------
> >>>
> >>> Nathan Haigh
> >>>
> >>> Bioinformatics PostDoctoral Research Associate
> >>>
> >>>
> >>>
> >>> Room B2 211
> >>>
> >>> Department of Animal and Plant Sciences
> >>>
> >>> University of Sheffield
> >>>
> >>> Western Bank
> >>>
> >>> Sheffield
> >>>
> >>> S10 2TN
> >>>
> >>>
> >>>
> >>> Tel: +44 (0)114 22 20112
> >>>
> >>> Mob: +44 (0)7742 533 569
> >>>
> >>> Fax: +44 (0)114 22 20002
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >> --
> >> Jason Stajich
> >> http://www.duke.edu/~jes12
> >> jason.stajich -at- duke.edu
> >>
> >>
> > --
> > Jason Stajich
> > http://www.duke.edu/~jes12
> > jason.stajich -at- duke.edu
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list