[EMBOSS] tfextract does not work properly with newer transfac site.dat file
Mauleon, Ramil (IRRI)
R.MAULEON at CGIAR.ORG
Tue Jun 22 09:53:49 UTC 2010
Hello,
I used tfextract on the Transfac 6.4 <site.dat> file to be able to use
this on tfscan, but it does not parse the file properly. Part of the
problem that I saw with the Transfac site.dat 6.4 file were:
1 - many entries had more that 1 motif sequences (the SQ line); these
subsequently weren't included in the parsed output
AC R00018
XX
ID MOUSE$ACRD_01
XX
DT 20.06.1990 (created); ewi.
DT 24.08.1995 (updated); hiwi.
CO Copyright (C), Biobase GmbH.
XX
TY D
XX
DE AChR delta (acetylcholine receptor, delta-subunit); Gene: G000457.
XX
SQ TGCCTGG.
SQ TGCCCTTG.
SQ TGCCCTAA.
SQ TGGCAAAC.
XX
SF -148
.
.
.
2 - Some motif sequences were broken up to 2 lines, for example..
AC R00709
XX
ID HA$HMGCR_02
XX
DT 20.06.1990 (created); ewi.
DT 06.09.1995 (updated); ewi.
CO Copyright (C), Biobase GmbH.
XX
TY D
XX
DE HMGCOAR (HMG-CoA reductase); Gene: G000157.
XX
SQ TGCTGGAACTCGACCAGCTATTGGTTGGCTCGGCCGTGGTGAGAGATGGTGCGGTGCCCG
SQ TTCTCC.
Thanks in advance for fixing tfextract
Ramil
---------------------------------
Ramil P. Mauleon
Bioinformatics Specialist
International Rice Research Institute
DAPO Box 7777, Metro Manila, Philippines
email: r.mauleon at cgiar.org <mailto:r.mauleon at cgiar.org>
phone: 632-580-5600 ext 2508 ; fax: 632-580-5699
---------------------------------
More information about the EMBOSS
mailing list