[Bioperl-l] Homologene again...

Jason Stajich jason@cgt.mc.duke.edu
Fri, 15 Feb 2002 08:47:46 -0500 (EST)


Elia -

I wrote a parser for the 5 columns data in the sqltable files that
Sonnhammer group provides- so this was a little easier than wading
through homologene where some genes don't have accession numbers - and
automating retrieval from the locuslink ID was also not apparent to me
(I guess one could download all of LocusLink).

I wanted to see cDNA and protein alignments for all of the DM vs HS
orthologs - so I used Bio::DB::SwissProt and pulled in the proteins. They
provide a db of the protein seq (no annotation) so this was only necessary
because I wanted to find the EMBL link and get the corresponding cDNA.  I
did a clustalw alignment of the protein (could have used needle here I
guess since I want global alignments and it is pairwise -- probably what
clustalw does anyways) - then built a cDNA alignment BASED on the protein
align.  This is done by inserting gaps in cDNA sequences based on where
they are in the protein alignment and building a SimpleAlign object with
these sequences.  Have to make sure that we handle UTRs okay (annotation
helps here if we use it) or else you end up in the wrong place.

Additionally did the cDNA align with just clustal to compare (still
curious if my approach works in all cases - sometimes I was not starting
in the right place and still trying to debug that).

For some cases the SwissProt ID had changed from the set they used - but
using the web interface allowed the old ids to still work.  There were a
couple of cases where the SwissProt record did not point to a valid cDNA
and so couldn't provide that data either.

Happy to provide the script and/or check it in if you think it would be
useful to where you're going.  I'm not actually running any of
the InParanoid analysis as I suspect you'll want to do with the
Fugu stuff.  InParanoid also provides Bootstrap values
for the orthologs based on the trees they built for each gene - so you get
a group of genes where 1 is from one species (HS) and the rest are from
the other species (DM) and you only want to see the alignment of the 2
orthologs (the HS gene and the DM with bootstrap value of 100%). Easy to
pick these out and do the alignments.

Hope that is useful/interesting.

-jason
On Fri, 15 Feb 2002, Elia Stupka wrote:

> > tried to write a basic parser for my own needs last month and sort of gave
> > up on the data - been happier with the InParanoid Orthologs for what I
> > needed in the end.
>
> Hey Jason,
>
> just read your mail, was wondering if you did anything around inparanoid
> in the meantime, any wrappers, objects, etc.?
>
> Elia
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu