[Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable?
Stefan Kirov
skirov at utk.edu
Tue Aug 22 11:39:53 UTC 2006
Sendu Bala wrote:
>I'm looking to extract data from some Transcription Factor Binding Site
>(TFBS) databases. For example, matrix, sequence and known position
>information out of Transfac flatfiles.
>
>Currently there is Bio::Matrix::PSM::IO::transfac, but it only gives you
>the PSM matrices, not the 'instance' sequences. Bio::Matrix::PSM also
>has this to say:
>
>
>
Sendu,
Transfac is not an open database so, you cannot get the instance data
anyway. There was a discussion on that recently. Since Bioperl is
completely open project, I am not sure it makes sense to put efforts
into supporting something that is not open- even if you have access to
the data files (which I believe Transfac does not allow in general) and
can develop additional methods/modules, how the rest of us can use it or
debug/support it?
Stefan
>
>
>>=head1 DESCRIPTION
>>
>>To handle a combination of site matrices and/or their corresponding
>>sequence matches (instances). This object inherits from
>>Bio::Matrix::PSM::SiteMatrix, so you can use the respective
>>methods. It may hold also an array of Bio::Matrix::PSM::InstanceSite
>>object, but you will have to retrieve these through
>>Bio::Matrix::PSM::Psm-E<gt>instances method (see below). To some extent
>>this is an expanded SiteMatrix object, holding data from analysis that
>>also deal with sequence matches of a particular matrix.
>>
>>
>>=head2 DESIGN ISSUES
>>
>>This does not make too much sense to me I am mixing PSM with PSM
>>sequence matches Though they are very closely related, I am not
>>satisfied by the way this is implemented here. Heikki suggested
>>different objects when one has something like meme But does this mean
>>we have to write a different objects for mast, meme, transfac,
>>theiresias, etc.? To me the best way is to return SiteMatrix object +
>>arrray of InstanceSite objects and then mast will return undef for
>>SiteMatrix and transfac will return undef for InstanceSite. Probably I
>>cannot see some other design issues that might arise from such
>>approach, but it seems more straightforward. Hilmar does not like
>>this beacause it is an exception from the general BioPerl rules Should
>>I leave this as an option? Also the header rightfully belongs the
>>driver object, and could be retrieved as hashes. I do not think it
>>can be done any other way, unless we want to create even one more
>>object with very unclear content.
>>
>>
>
>I actually want to get even more kinds of data out, so rather than
>extend Bio::Matrix::PSM::IO::transfac and related modules in some way,
>would it be more appropriate to have something like
>Bio::DB::TFBS::transfac which had a number of methods that gave specific
>kinds of objects? We could have get_psm() which gives a normal 'pure'
>Bio::Matrix::PSM with no InstanceSite objects, get_aln() which returns a
>Bio::SimpleAlign for the 'instance' sequences that were used to generate
>a given PSM, and get_map() which returns a new special kind of Bio::Map
>with binding site position information.
>
>Another way it makes a little more sense for this to be a 'DB' module
>and not an IO one is that there are multiple huge Transfac data files in
>the database, with related and cross-referenced information. To extract
>the complete information you would want to parse them all and create
>indexes for fast lookups later, not something you really expect of an IO
>module.
>
>
>Thoughts anyone?
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list