[Bioperl-l] TFBS databases, Bio::Matrix::PSM suitable?
Chris Fields
cjfields at uiuc.edu
Tue Aug 22 12:40:53 UTC 2006
Many Bio::DB* modules access the database to get the raw data, and
this is attached to an Bio::*IO stream class in some way (for most
cases). There are a few that get around this; for instance,
Bio::DB::Taxonomy* uses no specialized SeqIO-like class.
Like you mentioned, you could extend Bio::Matrix::PSM::IO::transfac
specifically to encompass the 'instance' sequences (the other PSM::IO
modules wouldn't have the same methods available to them), use
SimpleAlign or SeqFeature::SimilarityPair (I agree the former is
probably better). Or have the Bio::DB module set up to grab either
your 'instance' sequences by ID (where you could possibly implement
RandomAccessI) or a Transfac PSM (implement a new Matrix-based)
interface. TMTOWTDI.
Does the TFBS package have any overlap here? I haven't used them
(they require PDL which is a pain to install on WinXP) but they are
supposed to be fully integrated with Bioperl.
http://forkhead.cgb.ki.se/TFBS/
Chris
On Aug 22, 2006, at 3:23 AM, Sendu Bala wrote:
> I'm looking to extract data from some Transcription Factor Binding
> Site
> (TFBS) databases. For example, matrix, sequence and known position
> information out of Transfac flatfiles.
>
> Currently there is Bio::Matrix::PSM::IO::transfac, but it only
> gives you
> the PSM matrices, not the 'instance' sequences. Bio::Matrix::PSM also
> has this to say:
>
>> =head1 DESCRIPTION
>>
>> To handle a combination of site matrices and/or their corresponding
>> sequence matches (instances). This object inherits from
>> Bio::Matrix::PSM::SiteMatrix, so you can use the respective
>> methods. It may hold also an array of Bio::Matrix::PSM::InstanceSite
>> object, but you will have to retrieve these through
>> Bio::Matrix::PSM::Psm-E<gt>instances method (see below). To some
>> extent
>> this is an expanded SiteMatrix object, holding data from analysis
>> that
>> also deal with sequence matches of a particular matrix.
>>
>>
>> =head2 DESIGN ISSUES
>>
>> This does not make too much sense to me I am mixing PSM with PSM
>> sequence matches Though they are very closely related, I am not
>> satisfied by the way this is implemented here. Heikki suggested
>> different objects when one has something like meme But does this mean
>> we have to write a different objects for mast, meme, transfac,
>> theiresias, etc.? To me the best way is to return SiteMatrix
>> object +
>> arrray of InstanceSite objects and then mast will return undef for
>> SiteMatrix and transfac will return undef for InstanceSite.
>> Probably I
>> cannot see some other design issues that might arise from such
>> approach, but it seems more straightforward. Hilmar does not like
>> this beacause it is an exception from the general BioPerl rules
>> Should
>> I leave this as an option? Also the header rightfully belongs the
>> driver object, and could be retrieved as hashes. I do not think it
>> can be done any other way, unless we want to create even one more
>> object with very unclear content.
>
> I actually want to get even more kinds of data out, so rather than
> extend Bio::Matrix::PSM::IO::transfac and related modules in some way,
> would it be more appropriate to have something like
> Bio::DB::TFBS::transfac which had a number of methods that gave
> specific
> kinds of objects? We could have get_psm() which gives a normal 'pure'
> Bio::Matrix::PSM with no InstanceSite objects, get_aln() which
> returns a
> Bio::SimpleAlign for the 'instance' sequences that were used to
> generate
> a given PSM, and get_map() which returns a new special kind of
> Bio::Map
> with binding site position information.
>
> Another way it makes a little more sense for this to be a 'DB' module
> and not an IO one is that there are multiple huge Transfac data
> files in
> the database, with related and cross-referenced information. To
> extract
> the complete information you would want to parse them all and create
> indexes for fast lookups later, not something you really expect of
> an IO
> module.
>
>
> Thoughts anyone?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list