[Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs
Brian Osborne
osborne1 at optonline.net
Thu Feb 16 22:19:16 UTC 2006
Chris,
Yes. The question now is where to easily get the coordinates.
Brian O.
On 2/16/06 7:52 AM, "Chris Fields" <cjfields at uiuc.edu> wrote:
> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need. I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
>
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>
>> Harry,
>>
>> It's not clear to me that NCBI's eutils offers this capability
>> directly. You
>> can probably download Entrez Gene entries and parse them for
>> coordinates but
>> I know of no way to remotely retrieve genomic sequences like this
>> from NCBI
>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>> that some
>> of us favor and to prove to myself that this is simple to do I wrote a
>> script that I just added to examples/tools, it's called
>> extract_genes.pl and
>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>> species to some dir, download Entrez Gene's gene2accession file,
>> and run. It
>> creates and stores a hash for lookups, it won't read gene2accession
>> each
>> time it runs.
>>
>> Brian O.
>>
>>
>> On 2/14/06 12:15 PM, "Harry Mangalam" <hjm at tacgi.com> wrote:
>>
>>> Hi Brian,
>>>
>>> Thanks very much for the pointers and the speed of your reply and
>>> apologies
>>> for the speed of mine.
>>>
>>> This looks good, but what I was looking for was a bioP approach
>>> for hooking to
>>> an API at NCBI or EBI so I could get this info and seqs from
>>> them. In this
>>> case, speed of retrieval is not critical and I'd rather not
>>> download the
>>> entirety of the sequences to a local disk to hack at them.
>>>
>>> I've determined a screen-scraping approach to get them and could
>>> script that,
>>> but I thought that bioP had a method for using NCBI's external
>>> API's, tho it
>>> may be that my memory is faulty or the approach is no longer
>>> supported due to
>>> overload.
>>>
>>> Does NCBI make such APIs available anymore? I searched a bit for
>>> docs on them
>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>> which I
>>> haven't started to excavate).
>>>
>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>> listening?
>>>
>>> Harry
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>> Harry,
>>>>
>>>> Hope you're doing well. The approach could be based on
>>>> Bio::DB::Fasta. So,
>>>> from its documentation:
>>>>
>>>> use Bio::DB::Fasta;
>>>>
>>>> # create database from directory of fasta files
>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>
>>>> # simple access (for those without Bioperl)
>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>> my @ids = $db->ids;
>>>> my $length = $db->length('CHROMOSOME_I');
>>>> my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>> my $header = $db->header('CHROMOSOME_I');
>>>>
>>>> # Bioperl-style access
>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>
>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
>>>> my $seq = $obj->seq;
>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000);
>>>>
>>>> Do you already have the offsets?
>>>>
>>>> Brian O.
>>>>
>>>> On 2/12/06 1:46 AM, "Harry Mangalam" <hjm at tacgi.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>> still
>>>>> can't find the answer to this. Forgive me if I've missed something
>>>>> obvious.
>>>>>
>>>>> This should not be a novel request, but I've not found it
>>>>> answered. If
>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>> pointer to a
>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>
>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>> offset
>>>>> from a locus determined by HUGO keyword or GeneID. This would be a
>>>>> common followup chore for some extra analysis from a gene
>>>>> expression
>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed
>>>>> the
>>>>> sequence type to specify...?
>>>>>
>>>>>
>>>>> TIA!
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list