Bioperl: NCBI 'Query' email server
Lincoln Stein
lstein@cshl.org
Tue, 18 May 1999 20:15:35 -0400
The Boulder::Genbank library provides a nice interface to NetEntrez.
For example, if you want GenBank accession #M12345, the gb_get script
(10 lines of core code) provides a simple command-line function to get
it in pre-parsed format. Note that you can pull the sequence out just
by grepping for the Sequence tag, or use the Boulder parse functions
to retrieve and manipulate fields.
Entrez queries are also supported, as are local mirrors of GenBank.
Lincoln
PS: http://stein.cshl.org/software/boulder/
(~) 51% gb_get M12345
Organism=Mus musculus Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Eutheria; Rodentia; Sciurognathi; Myomorpha; Muridae; Murinae; Mus.
Title=Transposition of the immunoglobulin heavy chain enhancer to the myc oncogene in a murine plasmacytoma
Basecount={
a=366
c=392
t=356
g=426
}
Authors=Corcoran,L.M., Cory,S. and Adams,J.M.
Authors=Corcoran,L.M.
Version=M12345.1 GI:199964
Locus=MUSMYCN 1540 bp DNA ROD 19-MAR-1992
Definition=Mouse (ST4) c-myc proto-oncogene, promoter region.
Reference=1 (bases 1 to 1502)
Reference=2 (bases 1 to 1540)
Source=Mouse (cell line ST4, from S-MuLV infected BALB/c mouse) DNA.
Comment=A printed copy of the sequence in [1],[2] was kindly provided by L.M.Corcoran 21-OCT-1985.
Keywords=myc proto-oncogene; proto-oncogene.
Accession=M12345
Sequence=tctagaaccaatgcacagagcaaaagactcatgtttctggttggttaataagctagattatcgtgtatatataaagtgtgtatgtatacgtttggggattgtacagaatgcacagcgtagtattcaggaaaaaggaaactgggaaattaatgtataaattaaaatcagcttttaattagcttaacacacacatacgaaggcaaaaatgtaacgttactttgatctgatcagggccgacttttttttttaagtgcataattacgattccagtaataaaaggggaaagcttgggtttgtcctgggaggaaggggttaacggttttctttattctagggtctctgcaggctccccagatctgggttggcaattcactcctccccctttctgggaagtccgggttttccccaaccccccaattcatggcatattctcgcgtctagccttgattttccccaccccagctcctaaaccagagtctgctgcaaactggctccacaggggcaaagaggatttgcctcttgtgaaaaccgactgtggccctggaactgtgtggaggtgtatggggtgtagaccggcagagactcctcccggaggagccggtagagcgcacccgccgccactttactggactgcgcagggagacctacaggggaaagagccgcctccacaccacccgccggtggaagtccgaaccggaggtgctggagtgtgtgtgtgggggggggggggggaatctgccttttggcagcaaattggggggggggtcgttctggaaagaatgtgcccagtcaacataactgtacgaccaaaggcaaaatacacaatgccttccccgcgagatggagtggctgtttatccctaagtggctctccaagtatacgtggcagtgagttgctgagcaattttaataaaattccagacatcgtttttcctgcatagacctcatctgcggttgatcaccctctatcactccacacactgagcgggggctcctagataactcattcgttcgtccttccccctttctaaattctgttttccccagccttagagagacgcctggccgcccgggacgtgcgtgacgcggtccagggtacatggcgtattgtgtggagcgaggcagctgttccacctgcggtgactgatatacgcagggcaagaacacagttcagccgagcgctgcgcccgaacaaccgtacagaaagggaaaggactagcgcgcgagaagagaaaatggtcgggcgcgcagttaattcatgctgcgctattactgtttacaccccggagccggagtactggactgcgggctgaggctcctcctcctctttccccggctccccactagccccctcccgagttcccaaagcagagggcggggaaacgagaggaaggaaaaaaatagagagaggtggggaagggagaaagagaggttctctggctaatccccgcccacccgccctttatattccgggggtctgcgcggccgaggacccctggctgcgctgctctcagctgccgggtccgactcgcctcactcag
Nid=g199964
Journal=Cell 40, 71-79 (1985)
Journal=Unpublished (1986)
Medline=85099331
Features={
Source={
Organism=Mus musculus
Db_xref=taxon:10090
Position=1..1540
}
Mrna={
Position=1491..>1540
Note=myc mRNA
}
Allele={
Position=1124
Note=g in ST4; a in ABPC17
}
Misc_feature={
Position=72..73
Note=ST4 proviral insertion site
}
Misc_feature={
Position=96..97
Note=Tikaut proviral insertion site
}
Misc_feature={
Position=876..877
Note=ST1 proviral insertion site
}
Misc_feature={
Position=1129..1130
Note=ABPC17 Ig H-chain enhancer insertion site
}
}
=
Simon Twigger writes:
> Hi there,
>
> Im in the process of planning a perl-based application which will need
> to grab nucleotide sequences from NCBI based on accession numbers and
> then take those sequences and BLAST them against one of the databases at
> NCBI, parse the results and then deal with them further. In my initial
> exploration of these ideas, I came across the NCBI query email server
> which I am thinking will be a pretty decent way to grab sequences off
> NCBI in the absence of a more direct route. (is there a more direct
> route short of mirroring the database locally?)
>
> http://www.ncbi.nlm.nih.gov/Web/Search/email.html
>
> Consequently, Im looking at writing some perl to handle the 'Query'
> related code - formulating the query from supplied accession numbers,
> parsing the results of a 'Query' query (this can get confusing) and
> passing these on to downstream applications. I'll be writing this as a
> perl module and I was wondering if this sort of thing is of interest to
> bio-perl and if so how it might fit in with other projects you have
> going. I was thinking of using the bio::seq objects for the returned
> sequences and then going on to use bio::tools::blast.pm for dealing with
> the blast side of things rather than reinventing the wheel.
>
> I would be grateful for any ideas you all may have on this and if this
> can assist the overall bio-perl goal, I'd welcome any suggestions on how
> best to achieve this.
>
> Many thanks,
>
> Simon.
>
>
> --
> --------------------------------------------------
> Simon Twigger, Ph.D.
> Laboratory for Genetic Research,
> Cardiovascular Research Center,
> Medical College of Wisconsin,
> 8701 Watertown Plank Road,
> Milwaukee, WI, 53226
>
> tel. 414-456-4409 fax. 414-456-6516
> --------------------------------------------------
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================