[Bioperl-l] get CDS start site for entry in NCBI

Matthew McCormack mccormack at molbio.mgh.harvard.edu
Mon Apr 22 23:11:05 UTC 2013


Ke, Chris and Christopher,

     Exploring the ENSEMBL perl API, BioPerl with NCBI Eutilities, and 
Gramene's Biomart, I have learned much and am sure that I can find a 
solution among them. Thank you very much for your help and suggestions.

Matthew

On 4/17/2013 7:08 PM, Matthew McCormack wrote:
> I am not much of a Perl coder and I have a few questions.
>
>      First, I would like to write a script that will go to NCBI 
> genebank and get the base number for the start of the CDS region, e.g. 
> 235 (given a particular accession number). I have looked at HOWTO's 
> and documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut 
> and paste the examples and they work, but I can not figure out how to 
> get what I want; the CDS start site. I have difficulty knowing what 
> all the methods and their options are for the seqio object and 
> seq_object. Most of the examples seem to be using a file to get 
> information and not a website.
>
>    Actually, what I have to start with is a TAIR locus number such as 
> AT4g08500, but I can not search on this at NCBI and come up with a 
> unique entry. I may have to have a table of conversions from TAIR 
> locus number to accession numbers.
>
>   Also, I was looking for a bit of advice. What I am doing is getting 
> data off another web site. I have a script using the WWW::Mechanize 
> module in which I can input a link and go to that webpage, and then go 
> down a line of links (over 100) getting information from each link. As 
> part of that information that I am getting is the number base of a 
> binding site, but I want to know if that binding site is in the CDS. 
> The start number is the start of the gene, so say if the binding site 
> is 235, then I want to know if this is in the CDS. This data is not 
> provided by the website, that is why I want to go to NCBI and get the 
> start of the CDS. The data at NCBI for 'gene' has the same length as 
> the first webpage, but also contains the beginning of the CDS, say 
> 299, so with this information I can tell if the binding site is in the 
> CDS. Do you think the best way to do this is extract the info from the 
> link on the first web page, then go to NCBI and extract the CDS, then 
> back to the original web page and the next link, and so on, for a 
> couple of hundred links ? Or is there a better way ? I am concerned 
> about a script that will keep going back to NCBI.
>
> Matthew
>
>
>
> The information in this e-mail is intended only for the person to whom 
> it is
> addressed. If you believe this e-mail was sent to you in error and the 
> e-mail
> contains patient information, please contact the Partners Compliance 
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you 
> in error
> but does not contain patient information, please contact the sender 
> and properly
> dispose of the e-mail.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list