[Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable?

Tue Aug 22 16:30:51 UTC 2006

>===== Original Message From Sendu Bala <bix at sendu.me.uk> =====
>skirov wrote:
>>> Stefan Kirov wrote:
>>> Sendu Bala wrote:
>>>
>>> Transfac is not an open database so, you cannot get the instance
>>> data anyway.
>>>
>>> You can. It is in the sites.dat file and often in the matrix.dat
>>> file. It is also available freely and publicly via at least 2
>>> websites.
>>>
>> Could you please post the urls? Last time I checked Transfac was
>> specifically forbidding people from providing the data files. This
>> may have changed.
>
>I meant the 'instance' data is freely available, not the .dat files
>specifically.
My point too...
>
>http://www.gene-regulation.com/pub/databases.html#transfac (free reg
>required)
>
>http://www.cbil.upenn.edu/cgi-bin/tess/tess
>
>
>>> how the rest of us can use it or debug/support it?
>>>
>>> It may be possible to include a small example subset of the data in
>>> t/data; there is after all already t/data/transfac.dat (which is a
>>> small matrix.dat file).
>>
>> The test files are good only if there is access to the full data set.
>> By their nature, tests files can span only a representation of
>> multiple scenarios to check the installation validity, this in no way
>> could be a check for synchronization between the full data set and
>> the code.
>
>I'm not sure what you mean. Do you think that before a genbank parser
>can be released, all genbank files in existence must be supplied in the
>test suite to ensure it really does work on everyone's machine? 
Did I say that or you infer? What I said is the tests are prepared to make 
sure the code work properly under specific environment (architecture, 
compiler, etc.) However, developers must make sure their code does not get 
choked up by the full data set. This means that when you develop or debug 
transfac:sites.dat parser you parse the whole file. Unless you make such a 
test run you can never be sure your parser is actually good for a release.
The test
>data need only be representative, and if it isn't good enough and a user
>discovers a problem, a bug is reported and fixed as normal.
My point exactly- how bioperl contributors can fix a bug if they do not have 
access to the data? It means the user would have to submit a test set which 
could be a license violation. I do not know to what extent Transfac would 
pursue their rights and its implication, but I wouldn't like to try either.
>
>
>>> If someone is willing to develop and maintain a module that deals
>>> with a data source, it makes no difference if that source is open
>>> or not - it is useful either way to other people who also have
>>> access to that data. If there comes a time that the maintainer can
>>> no longer maintain it and it stops working because the data format
>>> changes, and no one knows the new format, it can be deprecated.
>>
>> In ideal world this may work. Imagine a situation where the code is
>> out of sync with the data format and noone is really able to check
>> that. Then a user with access to the data source would get burned by
>> trying to use the bioperl module, A natural reaction is then to blame
>> bioperl (and probably a correct one too).
>
>Well, yes, of course. This is the problem faced by 100% of the parsers
>in bioperl. They work until the file format changes, and then hopefully
>there is someone around who will fix the problem.
>
>I don't see the fear that in the future it may not work is a reason to
>not want it at all. Everything in bioperl may not work in the future.
>
>
>> The cost is usually much larger- both in support and maintenance.
>
>That cost is borne by the developer that choses to maintain the module.
>That would be me in this case, and it isn't a problem for me.
For how long? I trust your best intentions, but things do change...
>
>
>> This is not the point. The core should not get cluttered with code
>> that is not maintained. In general more widely used modules are
>> better maintained, but the real disaster would be a poorely
>> maintained module with a large audience.
>
>Who says it won't be maintained? I will maintain it. The very second I
>can no longer maintain it and no one else can, it can be deprecated to
>avoid clutter. I don't see the problem. But in any case see below -
>anyone could probably maintain it.
I guess that if you decide you can no longer do that you are going to remove 
it? Go ahead, but do not forget what you are promising now.
>
>
>> I agree that a transfac module is necessary and useful (this is why I
>> started developing one in the first place)  in general but I doubt it
>> is reasonable to support one without access to the underlying data
>> structure.
>
>I have access to the pro data files. Everyone has access to
>http://www.biobase-international.com/pages/index.php?id=117 which I
>think documents changes since the last version (in this case, there were
>no changes to the data format since 10.1). Everyone has access to the
>websites.
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l