[BioPython] EUtils strange behaviour
Bonis Sanz, Julio
JBonis at imim.es
Tue Nov 9 11:58:55 EST 2004
Hi!
I am playing again with EUtils and have detected a strange behaviour....
Let me focus in HTR2A receptor in humans' gene:
http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=10835174
Well, what I want to do is to retrieve in a gb_record (GenBank python library) all the SNPs in that gene... And it is getting more complex than I guessed....
I have passed to EUtils efetch this parameters:
db=nucleotide
id=10835174 (is the gi for my mRNA: HTR2A)
retmode=html
rettype=gb (GenBank mode)
extrafeat=1 (to get the SNPs... if you dont send extrafeat=1 EUtils will not show you any SNP or variation)....
This is the URL:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1
Well, there are 7 variations.... 4 of them are outside the limits of the CDS (so are not transcribed). I am only interested in the SNPs of the exons....
Finally I get these 3 SNPs:
variation 219
/gene="HTR2A"
/replace="a"
/replace="c"
/db_xref="dbSNP:1805055"
variation 734
/gene="HTR2A"
/replace="a"
/replace="g"
/db_xref="dbSNP:6304"
variation 1407
/gene="HTR2A"
/replace="c"
/replace="t"
/db_xref="dbSNP:1058576"
BUT.... here is the problem....
If you go to LocusLink for HTR2A gene you will find it:
http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=3356&mrna=NM_000621&ctg=NT_024524&prot=NP_000612&orien=reverse&view+rs+=view+rs+&chooseRs=coding
This is the list of SNPs for the exons in that HTR2A mRNA....
rs6314
rs6308
rs1058576
rs6304
rs6305
rs6313
rs1805055
That compared with the list we get from EUtils is:
LocusLink EUtils
rs6314
rs6308
rs1058576 ========= 1058576
rs6304 ========= 6304
rs6305
rs6313
rs1805055 ========= 1805055
You can see that there are 5 missing SNPs.... why?
Well, I have explanation for two of them (rs6305 and rs6313) are synonymous SNPs (no change in the protein sequence), but what about 6308 and rs6314? where have they gone?
I need to solve this mistery, to build an script that retrieves on-the-fly (from EUtils) all the SNPs of a given mRNA. I dont want to take all the contigs or FTPing the hole RefSeq human genome database. :S.
Of course I need to retrieve synonymous SNPs also (in fact one of the most clinically relevant SNPs in HTR2A is that synonymous SNPs rs6313... and nobody seems to know why).
Any comment, suggestion, idea (apart from RTFM... I promise I have read and read!!!!)
More information about the BioPython
mailing list