[Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs

Sun Feb 12 16:37:39 UTC 2006

Harry,

Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
from its documentation:

  use Bio::DB::Fasta;

  # create database from directory of fasta files
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  # simple access (for those without Bioperl)
  my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
  my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
  my @ids     = $db->ids;
  my $length   = $db->length('CHROMOSOME_I');
  my $alphabet = $db->alphabet('CHROMOSOME_I');
  my $header   = $db->header('CHROMOSOME_I');

  # Bioperl-style access
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
  my $seq     = $obj->seq;
  my $subseq  = $obj->subseq(4_000_000 => 4_100_000);

Do you already have the offsets?

Brian O.

On 2/12/06 1:46 AM, "Harry Mangalam" <hjm at tacgi.com> wrote:

> Hi All,
> 
> After perusing the tutorial and other docs for a an evening, I still can't
> find the answer to this.  Forgive me if I've missed something obvious.
> 
> This should not be a novel request, but I've not found it answered.  If
> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> better way, especially if it includes an illuminating bit of code.
> 
> The problem is to retrieve genomic sequences plus & minus some offset from a
> locus determined by HUGO keyword or GeneID.  This would be a common followup
> chore for some extra analysis from a gene expression expt.  Or maybe this is
> in the DBFetch routines, but I've missed the sequence type to specify...?
> 
> 
> TIA!