[Bioperl-l] Extract field from Medline

Wed Dec 7 17:44:07 EST 2005

Andrej-

I didn't run that script, it was just meant to give you a start.  I've
cleaned it up and it works for me - without knowing your file formats
that's about all I can do to help you.  You'll need to debug it and
adapt it to your needs.  If you are very new to perl and unfamiliar with
debugging scripts, then you will certainly want to consult one of the
many fine texts on the subject such as "Learning Perl" from O'Reilly.
You can use the script below as a starting point.

Barry

#!/usr/bin/perl

use strict;
use warnings;

my $file_terms = shift;
my $file_medline = shift;
open (TERM, $file_terms) or die "Can't open TERM";
open (MEDL, $file_medline) or die "Can't open MEDL";

my @terms = <TERM>;
my @lines = <MEDL>;

for my $line (@lines) {
    my ($pmid, $ti, $ab) = split /\t/, $line;
    for  my $term (@terms) {
        chomp $term;
        for ($pmid, $ti, $ab) {
            if (/$term/) {
                print "$pmid\t$ti\t$ab";
            }
        }
    }
}

> -----Original Message-----
> From: Andrej Kastrin [mailto:andrej.kastrin at siol.net]
> Sent: Wednesday, December 07, 2005 7:57 AM
> To: Barry Moore
> Cc: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] Extract field from Medline
> 
> Barry Moore wrote:
> 
> >Andrej-
> >
> >Doesn't really sound like you need Bioperl for this one - just some
> >loops and regular expressions.  Can't offer too much help without
seeing
> >your file formats, but a boiler plate might look like this:
> >
> >#!/usr/bin/perl
> >
> >use strict;
> >use warnings;
> >
> >my $file_terms = shift;
> >my $file_medline = shift;
> >open (TERM, $file_term) or die "Can't open TERM";
> >open (MEDL, $file_medline) or die "Can't open MEDL";
> >
> >my @terms = <TERM>;
> >
> >while (my ($pmid, $ti, $ab) = split <MEDL>) {
> >	for my $term (@terms) {
> >		if (/$term/ for ($pmid, $ti, $ab)) {
> >			print "$pmid\t$ti\t$ab";
> >		}
> >	}
> >}
> >
> >-----Original Message-----
> >From: bioperl-l-bounces at portal.open-bio.org
> >[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Andrej
> >Kastrin
> >Sent: Wednesday, December 07, 2005 5:40 AM
> >To: bioperl-l at portal.open-bio.org
> >Subject: [Bioperl-l] Extract field from Medline
> >
> >Hello all,
> >
> >big problem for me, small for you (while I'm noob in perl). I have a
> >list of terms (i.e. genes, gene products) in row data format. Now I
have
> >
> >to parse Medline (standard Medline format) and extract PMID, TI and
AB
> >(ID number, Title and Abstract) fields which involve any term in my
term
> >
> >list. I already transform Medline "multiline" format to "single"
line,
> >so there is only one line per each field.
> >
> >How to start? Thanks for any suggesstion.
> >Best, Andrej
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
> Hi,
> I try this but something wrong, due to compilation problem (lines 15
> and 19); and also, how to include input files?