[Bioperl-l] How to extract promoter region seq from genbank or another source?

Sean Davis sdavis2 at mail.nih.gov
Sat Oct 15 09:57:01 EDT 2005


Sam,

One of the simplest ways to do this is to use the UCSC table browser.  Go
to:

http://genome.ucsc.edu/cgi-bin/hgTables

Choose your organism and assembly of choice.  Choose group "mRNA and EST
tracks" and then choose either track Human ESTs (if your species is human,
as an examlpe) or Human mRNAs.  Click region "genome".  Click "paste list"
or "upload list" to give your accessions.  Then choose output format
"sequence".  Click "get output".  You will get a new page.  Choose
"promoter/Upstream by" and choose the number of bases you want.  You can
also deselect the other things on the page if you don't want other sequence.
You get the idea.  If you started with ESTs, then go back and do the same
with the mRNA table.  I don't think you need to split your accession
list--just use the same both times.

After this, you will have two fasta files (which you can combine into one).
Then use Bio::SeqIO to read them in.  Note that many accessions will align
to multiple regions of the genome and will, therefore, be represented by
multiple promoter regions.  You may want to filter these accessions out.

Sean





 You can download the tables or batch query them with On 10/14/05 10:46 PM,
"Stefan Kirov" <skirov at utk.edu> wrote:

> Sam,
> You can use MART to convert to ensembl id (in most cases). I don't think
> they support genebank. You can try to use genekeydb
> (genereg.ornl.gov/gkdb), either download it or use the online converter,
> but my guess is you are not going to get too many ids. One thing I may
> fix in the future, but right now... Still may be worth a try. Look at
> seqhound too (http://www.blueprint.org/seqhound/index.html).
> Stefan
> 
> Brian Osborne wrote:
> 
>> ENSEMBL experts?
>> 
>> ------ Forwarded Message
>> From: Sam Al-Droubi <saldroubi at yahoo.com>
>> Date: Fri, 14 Oct 2005 14:05:38 -0700 (PDT)
>> To: Brian Osborne <brian_osborne at cognia.com>
>> Subject: Re: [Bioperl-l] How to extract promoter region seq from genbank or
>> another source?
>> 
>> Hi Brian,
>> 
>> Thank you for the response.  I looked at it but it seems that enembl does
>> not use accession numbers.   It seems that they have their own numbering
>> scheme.  If so how do I get the mapping between the two.  If I can't get the
>> promoter region sequence then do you know if there is a way I can get the
>> entire chromosome sequence?  If so, I can then try to find the gene within
>> it and then grab the promoter region.
>> I am new to all this so I am sorry if I sound ignorant in this area.
>> 
>> On the surface, it seems that one should be able to do this easily but it
>> has not been easy so far.
>> 
>> Thank you. 
>> 
>> 
>> Brian Osborne <brian_osborne at cognia.com> wrote:
>>  
>> 
>>> Sam,
>>> 
>>> ensembl may be one solution, I think it provides a good API for these sorts
>>> of queries. See the ensembl API documentation for more information
>>> (http://www.ensembl.org/info/software/core/core_tutorial.html).
>>> 
>>> Brian O.
>>> 
>>> 
>>> 
>>> On 10/13/05 11:25 AM, "Sam Al-Droubi" wrote:
>>> 
>>>    
>>> 
>>>>> Hello,
>>>>> 
>>>>> I am totally new to BioPerl. I was able to install it and retrieve data
>>>>>        
>>>>> 
>>>> from
>>>>      
>>>> 
>>>>> GenBank. I have a list of accession numbers for genes but I want to use
>>>>> BioPerl to get the promoter region (1000 bp before the start of the gene).
>>>>> Can someone point me in the right direction on how to accomplish this.
>>>>> 
>>>>> Tech info: Using bioperl-1.5 on SuSE 9.3 professional machine.
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Sincerely, 
>>>>> Sam Al-Droubi, M.S.
>>>>> saldroubi at yahoo.com
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>        
>>>>> 
>>>    
>>> 
>> 
>> 
>> Sincerely, 
>> Sam Al-Droubi, M.S.
>> saldroubi at yahoo.com
>> 
>> ------ End of Forwarded Message
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>  
>> 



More information about the Bioperl-l mailing list