[Bioperl-l] HomoloGene

Wed, 20 Feb 2002 09:26:33 +1300

Hi Amir,

This is because the eighth column is meant to have a unigene as you 
say but Dm doesn't have unigene numbers assigned to it so they use 
the LocusLink ID instead. So it is most definitely possible to find 
LL in the eighth column.

I'll be posting my parsing effort soon.

Cheers, Andrew.

>Just FYI for people who are trying to use HomoloGene: as Jason suggested in
>his message on the 14th, the HomoloGene file is confused. In particular, I
>was looking at the hmlg.ftp file (not the triplet file Jason parsed). It
>turns out that -- at least in the version from January 11 -- there are a
>bunch of lines where the LocusLink ID is in the eighth column instead of the
>seventh. (The majority of lines have it in the seventh column where the
>README says it should be. I can't tell if this is better or worse.) The
>eighth column is supposed to be Unigene identifiers, which are just numbers,
>(since the species is in the second column) so "LL" should never be found in
>that column.
>
>>perl -wan -F'\|' -e 'next unless $#F; print "$. $F[7]\n" if $F[7]=~/LL/;'
>hmlg.ftp
>23 LL.42489
>27 LL.41094
>40 LL.42919
>57 LL.38254
>69 LL.42058
>83 LL.31837
>158 LL.37503
>
>(etc.)
>
>I wrote to the folks at NCBI but haven't gotten a response.
>
>Amir Karger
>CuraGen Corporation