Bioperl: NCBI 'Query' email server

Lincoln Stein lstein@cshl.org
Tue, 18 May 1999 20:15:35 -0400


The Boulder::Genbank library provides a nice interface to NetEntrez.

For example, if you want GenBank accession #M12345, the gb_get script
(10 lines of core code) provides a simple command-line function to get
it in pre-parsed format.  Note that you can pull the sequence out just
by grepping for the Sequence tag, or use the Boulder parse functions
to retrieve and manipulate fields.

Entrez queries are also supported, as are local mirrors of GenBank.

Lincoln

PS: http://stein.cshl.org/software/boulder/

(~) 51% gb_get M12345
Organism=Mus musculus Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Eutheria; Rodentia; Sciurognathi; Myomorpha; Muridae; Murinae; Mus.
Title=Transposition of the immunoglobulin heavy chain enhancer to the myc oncogene in a murine plasmacytoma
Basecount={
  a=366
  c=392
  t=356
  g=426
}
Authors=Corcoran,L.M., Cory,S. and Adams,J.M.
Authors=Corcoran,L.M.
Version=M12345.1  GI:199964
Locus=MUSMYCN      1540 bp    DNA             ROD       19-MAR-1992
Definition=Mouse (ST4) c-myc proto-oncogene, promoter region.
Reference=1  (bases 1 to 1502)
Reference=2  (bases 1 to 1540)
Source=Mouse (cell line ST4, from S-MuLV infected BALB/c mouse) DNA.
Comment=A printed copy of the sequence in [1],[2] was kindly provided by L.M.Corcoran 21-OCT-1985.
Keywords=myc proto-oncogene; proto-oncogene.
Accession=M12345
Sequence=tctagaaccaatgcacagagcaaaagactcatgtttctggttggttaataagctagattatcgtgtatatataaagtgtgtatgtatacgtttggggattgtacagaatgcacagcgtagtattcaggaaaaaggaaactgggaaattaatgtataaattaaaatcagcttttaattagcttaacacacacatacgaaggcaaaaatgtaacgttactttgatctgatcagggccgacttttttttttaagtgcataattacgattccagtaataaaaggggaaagcttgggtttgtcctgggaggaaggggttaacggttttctttattctagggtctctgcaggctccccagatctgggttggcaattcactcctccccctttctgggaagtccgggttttccccaaccccccaattcatggcatattctcgcgtctagccttgattttccccaccccagctcctaaaccagagtctgctgcaaactggctccacaggggcaaagaggatttgcctcttgtgaaaaccgactgtggccctggaactgtgtggaggtgtatggggtgtagaccggcagagactcctcccggaggagccggtagagcgcacccgccgccactttactggactgcgcagggagacctacaggggaaagagccgcctccacaccacccgccggtggaagtccgaaccggaggtgctggagtgtgtgtgtgggggggggggggggaatctgccttttggcagcaaattggggggggggtcgttctggaaagaatgtgcccagtcaacataactgtacgaccaaaggcaaaatacacaatgccttccccgcgagatggagtggctgtttatccctaagtggctctccaagtatacgtggcagtgagttgctgagcaattttaataaaattccagacatcgtttttcctgcatagacctcatctgcggttgatcaccctctatcactccacacactgagcgggggctcctagataactcattcgttcgtccttccccctttctaaattctgttttccccagccttagagagacgcctggccgcccgggacgtgcgtgacgcggtccagggtacatggcgtattgtgtggagcgaggcagctgttccacctgcggtgactgatatacgcagggcaagaacacagttcagccgagcgctgcgcccgaacaaccgtacagaaagggaaaggactagcgcgcgagaagagaaaatggtcgggcgcgcagttaattcatgctgcgctattactgtttacaccccggagccggagtactggactgcgggctgaggctcctcctcctctttccccggctccccactagccccctcccgagttcccaaagcagagggcggggaaacgagaggaaggaaaaaaatagagagaggtggggaagggagaaagagaggttctctggctaatccccgcccacccgccctttatattccgggggtctgcgcggccgaggacccctggctgcgctgctctcagctgccgggtccgactcgcctcactcag
Nid=g199964
Journal=Cell 40, 71-79 (1985)
Journal=Unpublished (1986)
Medline=85099331
Features={
  Source={
    Organism=Mus musculus
    Db_xref=taxon:10090
    Position=1..1540 
  }
  Mrna={
    Position=1491..>1540 
    Note=myc mRNA
  }
  Allele={
    Position=1124 
    Note=g in ST4; a in ABPC17
  }
  Misc_feature={
    Position=72..73 
    Note=ST4 proviral insertion site
  }
  Misc_feature={
    Position=96..97 
    Note=Tikaut proviral insertion site
  }
  Misc_feature={
    Position=876..877 
    Note=ST1 proviral insertion site
  }
  Misc_feature={
    Position=1129..1130 
    Note=ABPC17 Ig H-chain enhancer insertion site
  }
}
=


Simon Twigger writes:
 > Hi there,
 > 
 > Im in the process of planning a perl-based application which will need
 > to grab nucleotide sequences from NCBI based on accession numbers and
 > then take those sequences and BLAST them against one of the databases at
 > NCBI, parse the results and then deal with them further. In my initial
 > exploration of these ideas, I came across the NCBI query email server
 > which I am thinking will be a pretty decent way to grab sequences off
 > NCBI in the absence of a more direct route. (is there a more direct
 > route short of mirroring the database locally?)
 > 
 > http://www.ncbi.nlm.nih.gov/Web/Search/email.html
 > 
 > Consequently, Im looking at writing some perl to handle the 'Query'
 > related code - formulating the query from supplied accession numbers,
 > parsing the results of a 'Query' query (this can get confusing) and
 > passing these on to downstream applications. I'll be writing this as a
 > perl module and I was wondering if this sort of thing is of interest to
 > bio-perl and if so how it might fit in with other projects you have
 > going. I was thinking of using the bio::seq objects for the returned
 > sequences and then going on to use bio::tools::blast.pm for dealing with
 > the blast side of things rather than reinventing the wheel.
 > 
 > I would be grateful for any ideas you all may have on this and if this
 > can assist the overall bio-perl goal, I'd welcome any suggestions on how
 > best to achieve this.
 > 
 > Many thanks,
 > 
 > 	Simon.
 > 
 > 
 > -- 
 > --------------------------------------------------
 > Simon Twigger, Ph.D.
 > Laboratory for Genetic Research,
 > Cardiovascular Research Center,
 > Medical College of Wisconsin,
 > 8701 Watertown Plank Road, 
 > Milwaukee, WI, 53226
 > 
 > tel. 414-456-4409               fax. 414-456-6516
 > --------------------------------------------------
 > =========== Bioperl Project Mailing List Message Footer =======
 > Project URL: http://bio.perl.org/
 > For info about how to (un)subscribe, where messages are archived, etc:
 > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > ====================================================================
-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================