[Bioperl-l] Bio::FPC

Jamie Hatfield jamie@genome.arizona.edu
Fri, 15 Nov 2002 09:15:20 -0700


Hello all, I need some advice.

I work at the Arizona Genomics Institute under Dr. Cari Soderlund (if
you don't know her, she used to work at the Sanger Centre, where she
developed FPC - FingerPrinted Contigs - probably the most used software
for physical map construction.  She's here in Tucson, AZ after a short
hiatus in Clemson, SC)  Anyway, I've re-introduced our group to Bioperl
and we are starting to take advantage of it whereever possible.  Cari
had seen Bioperl before, but that was pre 1.0 days, when things weren't
stable enough (in her opinion) for a production environment, after which
point, she never got around to looking into it again.

I noticed in some document from a presentation given by one of the
Bioperl bigwigs (might have been LStein), that a FPC parser was a common
request.  If that's true, we know fpc probably as well as anybody else
so it would make sense for us to develop/maintain it.

So now we would like to make a contribution.  Don't get too excited
yet... It's not programmed yet.  But we have found that in many, many
different areas we need to read a .fpc file (and corresponding .cor
file) and Do Something(c) with it.  At the same time, I want to get more
familiar with Bioperl.  I've done fairly simple things, like reading in
fasta/genbank/swisspro format files and working with alignments (as you
all saw in my EST Alignment questions).

The advice I want is as follows:
1) Where are the standards/guidelines for writing Bioperl modules?
2) Any ideas on what features/functionality Bio::FPC should have?
3) Any ideas on what (if any besides Bio::Root) I should inherit from?
4) Should this be an interface and separate implementation or just an
implementation?
   (i.e., are there other file formats/programs for physical maps?)
5) What Bioperl objects should I use in construction?

These are the ideas I have so far (after all of a day of thinking about
it, so feel free to laugh/scorn/suggest better implementations)
(all these classes should be prefixed with Bio::FPC

1) ::Project
  This would be the main class.  It would contain the information parsed
from the top 8 or so lines of the .fpc file.  It would also contain the
rest of these objects.

2) ::Clone
  Obviously, this is the clone (or more properly - fingerprinted clone)
from the fpc file.  The attributes would include type (Clone, BAC, PAC)
name, bands[], sizes[] (if available), a few dates (creation,
modification), remarks (normal and fpc remarks), contig (and range),
matching clones (parents and children; approximate, exact, and pseudo),
markers, etc.  Basically anything you might find as the /^(\w+)/ of the
line in a .fpc file.

  In typing that out, it seems that maybe the contig and range that a
clone hits would best be implemented as a type of RangeI class, which is
more apparent now that I typed that sentence.  Moving on then...

3) ::Contig
  Contig number, datetime, status (Ok, NoCB, Avoid, NoAce, Dead), #Q's,
description.

4) ::Marker
  Type (STS, eMRK, whatever), date (create,mod), Global position (if
anchored to framework)

That's basically it for the objects.  Although the contigrange might
need to be an object inherited from RangeI.  

So now I need some input, and we'll see if I can't get started coding
this.

Thanks!

----------------------------------------------------------------------
Jamie Hatfield                              Room 541H, Marley Building
Systems Programmer                          University of Arizona
Arizona Genomics Computational              Tucson, AZ  85721
  Laboratory (AGCoL)                        (520) 626-9598