[Bioperl-l] whole genome annotation

Fri Jul 28 13:35:02 UTC 2006

Richard,

A good starting point is a FAQ page we created that describes various ways
of extracting genomic sequence:

http://www.bioperl.org/wiki/Getting_Genomic_Sequences

Check that out, and Sean's suggestion, and write back to bioperl-l if you
have questions. One thing that this page doesn't really address is the
special challenge that comes with working with very large sequences, this is
something you might have to consider as well.

You also asked about downloading the human genome and its annotations.
There's also more than one way to do this as well. You'd have access to this
data if you used the ENSEMBL API but you can get the Genbank files at
ftp://ftp.ncbi.nih.gov/genomes/. Having said that I should add that one of
the advantages of the ENSEMBL API approach is that you don't have to
download the entire genome. Don't know what machine you're working on but,
again, trying to manipulate very large sequences may tax your computer as
well as your patience.

Brian O.

On 7/28/06 5:39 AM, "Richard Birnie" <R.Birnie at leeds.ac.uk> wrote:

> Hello all,
> 
> I'm just trying to familiarise myself with BioPerl and I'm a little
> overwhelmed by the sheer volume of information available on the wiki. I'm
> hoping someone can point in the right direction through the labyrinth. This
> may become a little longwinded but I'll try and get all the annoying newbie
> questions out of the way in one go.
> 
> Let me try and explain what I'm aiming for. I have some CGH data downloaded
> from the Progenetix database
> (http://www.progenetix.de/~pgscripts/progenetix/Aboutprogenetix.html), this
> data is  simplified to record simply gain/loss/amplification of whole
> chromosome bands at 862 band resolution to facilitate the combination of data
> from multiple different studies.
> 
> What I'd like to be able to do is download a copy of the human genome sequence
> with annotation describing the locations of chromosome bands and preferably of
> known genes. I then want to be able to manipulate the genome data based on the
> CGH data to mimic deletions. The ultimate goal of this is to be able to feed
> the manipulated genome data into a program (metashark) that predicts the
> structure of metabolic networks based on genome annotation compared to a
> reference genome, in this case a complete 'normal' human genome and see what
> effect that has on the metabolic pathways.
> 
> I appreciate that is a bit vague but thats sort of my problem, I'm not a
> bioinformatician really so I'm not sue of the details of what I want. I just
> happen to have an question to answer and bioperl seems the way to go (for this
> project and more generally). I've started looking at the HOWTOs and read the
> main bioperl tutorial. I also looked at the CGL comparative genomics library
> but I haven't penetrated far into that yet. I'm ok with basic perl although
> not much object oriented stuff. I don't really have much experience with
> handling sequence data on a whole genome scale either, a few genbank files for
> my favourite genes is fine but I need some guidance to work on this scale.
> 
> What I'm looking for is someone to give me a start. I'd greatly appreciate it
> if someone could spell out the general steps for downloading a complete copy
> of the human genome and its annotations (if this is even a feasible approach)
> and how to put it all together. Not actual code just the general concept for
> each step and which tools from the bioperl set would be most appropriate for
> each step so that I can focus what I need to read about, even a little
> pseudo-code if I'm lucky. If I can get the genome data downloaded and setup
> properly I'll work out how to apply the CGH data to it myself.
> 
> If example code for what I'm trying to describe is included somewhere, great
> could someone point to where.
> 
> Thanks for your patience.
> best regards,
> Richard
> 
> 
> 
> Dr Richard Birnie
> Scientific Officer
> Section of Pathology and Tumour Biology
> Welcome Brenner Building, LIMM
> St James University Hospital
> Beckett St, Leeds, LS9 7TF
> Tel:0113 3438624
> e-mail: r.birnie at leeds.ac.uk
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l