[Bioperl-l] HomoloGene

Karger, Amir AKarger@CuraGen.com
Tue, 19 Feb 2002 09:49:08 -0500


Just FYI for people who are trying to use HomoloGene: as Jason suggested in
his message on the 14th, the HomoloGene file is confused. In particular, I
was looking at the hmlg.ftp file (not the triplet file Jason parsed). It
turns out that -- at least in the version from January 11 -- there are a
bunch of lines where the LocusLink ID is in the eighth column instead of the
seventh. (The majority of lines have it in the seventh column where the
README says it should be. I can't tell if this is better or worse.) The
eighth column is supposed to be Unigene identifiers, which are just numbers,
(since the species is in the second column) so "LL" should never be found in
that column.

>perl -wan -F'\|' -e 'next unless $#F; print "$. $F[7]\n" if $F[7]=~/LL/;'
hmlg.ftp
23 LL.42489 
27 LL.41094 
40 LL.42919 
57 LL.38254 
69 LL.42058 
83 LL.31837 
158 LL.37503 

(etc.)

I wrote to the folks at NCBI but haven't gotten a response.

Amir Karger
CuraGen Corporation 
 
LEGAL NOTICE - Unless expressly stated otherwise, this message is
confidential and may be privileged. It is intended for the addressee(s)
only. Access to this e-mail by anyone else is unauthorized. If you are not
an addressee, any disclosure or copying of the contents or any action taken
(or not taken) in reliance on it is unauthorized and may be unlawful. If you
are not an addressee, please inform the sender immediately.