[Bioperl-l] Parsing the accession numbers in Refseq

Tue, 08 May 2001 15:32:58 +0100

Suraj,

If you have the entry (or the interesting part of it)
in variable $s, then the following line will put the accession number
into variable $a:

($a) = $s =~ /ACCESSION +(\w+)\W+COMMENT +REVIEWED/;

If you want to parse this straight from the file input, you'll have to
play with the $INPUT_RECORD_SEPARATOR, more commonly known as $/. Set
it to entry delimiter (// ?) and write something like:

$/ = '//';
while (<>) {
    ($a) = $_ =~ /ACCESSION +(\w+)\W+COMMENT +REVIEWED/;
    print "$a\n";
}

	-Heikki

Suraj Peri wrote:
> 
> hi ,
>    I took the refseq database and parsed  only the
> Accession numbers and the entries with Reviewed Refseq
> by using PERL RegEX.  now i want only the accession
> number preceding the REVIEWED.
> like
> ACCESSION NM_021640
> COMMENT REVIEWED REFSEQ:
> lines only and not the accession numbers followed with
> out comment line.
> 
>  how can i do this using Regular Expressions. Please
> help me ASAP.
> Thank you in advance.
> 
> Example:
>  ACCESSION   NM_021640
> ACCESSION   NM_001158
> COMMENT     REVIEWED REFSEQ: This record has been
> curated by NCBI staff. The
>  ACCESSION   NM_018607
> COMMENT     REVIEWED REFSEQ: This record has been
> curated by NCBI staff. The
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Auctions - buy the things you want at great prices
> http://auctions.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________