[Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs

Tue Feb 14 17:15:11 UTC 2006

Hi Brian,

Thanks very much for the pointers and the speed of your reply and apologies 
for the speed of mine.

This looks good, but what I was looking for was a bioP approach for hooking to 
an API at NCBI or EBI so I could get this info and seqs from them.  In this 
case, speed of retrieval is not critical and I'd rather not download the 
entirety of the sequences to a local disk to hack at them.

I've determined a screen-scraping approach to get them and could script that, 
but I thought that bioP had a method for using NCBI's external API's, tho it 
may be that my memory is faulty or the approach is no longer supported due to 
overload.  

Does NCBI make such APIs available anymore?  I searched a bit for docs on them 
but couldn't find anything (unless it's buried in the NCBI tookit, which I 
haven't started to excavate).

Failing that, would SEALS provide such a service? Any PerlPinipeds listening?

Harry

On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> Harry,
>
> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
> from its documentation:
>
>   use Bio::DB::Fasta;
>
>   # create database from directory of fasta files
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   # simple access (for those without Bioperl)
>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>   my @ids     = $db->ids;
>   my $length   = $db->length('CHROMOSOME_I');
>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>   my $header   = $db->header('CHROMOSOME_I');
>
>   # Bioperl-style access
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>   my $seq     = $obj->seq;
>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>
> Do you already have the offsets?
>
> Brian O.
>
> On 2/12/06 1:46 AM, "Harry Mangalam" <hjm at tacgi.com> wrote:
> > Hi All,
> >
> > After perusing the tutorial and other docs for a an evening, I still
> > can't find the answer to this.  Forgive me if I've missed something
> > obvious.
> >
> > This should not be a novel request, but I've not found it answered.  If
> > bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> > better way, especially if it includes an illuminating bit of code.
> >
> > The problem is to retrieve genomic sequences plus & minus some offset
> > from a locus determined by HUGO keyword or GeneID.  This would be a
> > common followup chore for some extra analysis from a gene expression
> > expt.  Or maybe this is in the DBFetch routines, but I've missed the
> > sequence type to specify...?
> >
> >
> > TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <<plain text preferred>>