[EMBOSS] tfextract does not work properly with newer transfac site.dat file

Mauleon, Ramil (IRRI) R.MAULEON at CGIAR.ORG
Tue Jun 22 09:53:49 UTC 2010


Hello, 

I used tfextract on the Transfac 6.4 <site.dat> file to be able to use
this on tfscan, but it does not parse the file properly. Part of the
problem that I saw with the Transfac site.dat 6.4 file were:

 

1 - many entries had more that 1 motif sequences (the SQ line); these
subsequently weren't  included in the parsed output

 

AC  R00018

XX

ID  MOUSE$ACRD_01

XX

DT  20.06.1990 (created); ewi.

DT  24.08.1995 (updated); hiwi.

CO  Copyright (C), Biobase GmbH.

XX

TY  D

XX

DE  AChR delta (acetylcholine receptor, delta-subunit); Gene: G000457.

XX

SQ  TGCCTGG.

SQ  TGCCCTTG.

SQ  TGCCCTAA.

SQ  TGGCAAAC.

XX

SF  -148

 

.

.

.

 

 

2 - Some motif sequences were broken up to 2 lines, for example..

 

AC  R00709

XX

ID  HA$HMGCR_02

XX

DT  20.06.1990 (created); ewi.

DT  06.09.1995 (updated); ewi.

CO  Copyright (C), Biobase GmbH.

XX

TY  D

XX

DE  HMGCOAR (HMG-CoA reductase); Gene: G000157.

XX

SQ  TGCTGGAACTCGACCAGCTATTGGTTGGCTCGGCCGTGGTGAGAGATGGTGCGGTGCCCG

SQ  TTCTCC.

 

Thanks in advance for fixing tfextract

 

Ramil

---------------------------------

Ramil P. Mauleon

Bioinformatics Specialist

International Rice Research Institute

DAPO Box 7777, Metro Manila, Philippines

email: r.mauleon at cgiar.org <mailto:r.mauleon at cgiar.org> 

phone: 632-580-5600 ext 2508 ; fax: 632-580-5699

---------------------------------

 





More information about the EMBOSS mailing list