[Bioperl-l] get CDS start site for entry in NCBI
Matthew McCormack
mccormack at molbio.mgh.harvard.edu
Mon Apr 22 23:11:05 UTC 2013
Ke, Chris and Christopher,
Exploring the ENSEMBL perl API, BioPerl with NCBI Eutilities, and
Gramene's Biomart, I have learned much and am sure that I can find a
solution among them. Thank you very much for your help and suggestions.
Matthew
On 4/17/2013 7:08 PM, Matthew McCormack wrote:
> I am not much of a Perl coder and I have a few questions.
>
> First, I would like to write a script that will go to NCBI
> genebank and get the base number for the start of the CDS region, e.g.
> 235 (given a particular accession number). I have looked at HOWTO's
> and documentation for Bio::SeqIO and Bio::DB::GenBank and I can cut
> and paste the examples and they work, but I can not figure out how to
> get what I want; the CDS start site. I have difficulty knowing what
> all the methods and their options are for the seqio object and
> seq_object. Most of the examples seem to be using a file to get
> information and not a website.
>
> Actually, what I have to start with is a TAIR locus number such as
> AT4g08500, but I can not search on this at NCBI and come up with a
> unique entry. I may have to have a table of conversions from TAIR
> locus number to accession numbers.
>
> Also, I was looking for a bit of advice. What I am doing is getting
> data off another web site. I have a script using the WWW::Mechanize
> module in which I can input a link and go to that webpage, and then go
> down a line of links (over 100) getting information from each link. As
> part of that information that I am getting is the number base of a
> binding site, but I want to know if that binding site is in the CDS.
> The start number is the start of the gene, so say if the binding site
> is 235, then I want to know if this is in the CDS. This data is not
> provided by the website, that is why I want to go to NCBI and get the
> start of the CDS. The data at NCBI for 'gene' has the same length as
> the first webpage, but also contains the beginning of the CDS, say
> 299, so with this information I can tell if the binding site is in the
> CDS. Do you think the best way to do this is extract the info from the
> link on the first web page, then go to NCBI and extract the CDS, then
> back to the original web page and the next link, and so on, for a
> couple of hundred links ? Or is there a better way ? I am concerned
> about a script that will keep going back to NCBI.
>
> Matthew
>
>
>
> The information in this e-mail is intended only for the person to whom
> it is
> addressed. If you believe this e-mail was sent to you in error and the
> e-mail
> contains patient information, please contact the Partners Compliance
> HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you
> in error
> but does not contain patient information, please contact the sender
> and properly
> dispose of the e-mail.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list