[Bioperl-l] Bio::Assembly::IO::ace with CAP3 ace files

Robson Francisco de Souza rfsouza@citri.iq.usp.br
Thu, 26 Dec 2002 11:39:34 -0200 (BRST)


	Hi,

On Tue, 24 Dec 2002, Ewan Birney wrote:
> On Mon, 23 Dec 2002, Marc Logghe wrote:
> > Hi,
> > I had to hack into Bio::Assembly::IO::ace a little to make it work with ace
> > files generated by CAP3 and to allow handling of multiple assemblies per
> > file.
> > The diff is included in case somebody would need it also (revision 1.1 is
> > the initial commit on bioperl-live). I hope I did not break anything; did
> > not test it with phrap ace files.
> > Great module, I especially like the change_coord() method to change between
> > the 4 coordinate systems. Really cool !!! Thank you Robson F. de Souza !

	You are welcome. I'm glad to know this is useful :).

> Ideally before applying this I would like to get it ok'd by someone who
> use the Bio::Assembly::IO::ace (ideally Robson himself I guess) as I don't
> know what is going on there. If not, I am tempted to leave this until
> 1.3/4 series

	I've tested Marc's patch with a few example scripts. It works
with phrap ace files as well as cap3 ace files, but it changes the way
phrap contig's are identified by adding the 'Contig' string before the
current ID's. This way, the same contigs, when loaded from
ace and phrap.out files will have different ID's. I guess the better
solution here, as phrap.out files are unstable and poorly documented, is
to apply the following patch to phrap.pm so that the ID's will be the same
as those loaded by ace.pm (sorry for the long lines, but they are in the 
code):

--- phrap.pm    Mon Nov 11 12:38:30 2002     1.1
+++ phrap.pm    Thu Dec 26 11:25:00 2002     1.2
@@ -211,7 +211,7 @@
        # Loading contig information
        /^Contig (\d+)\.\s+(\d+) reads?; (\d+) bp \(untrimmed\), (\d+) \(trimmed\)\./ && do {
            my $nof_reads = $2; my $length = $3; my $trimmed_length = $4;
-           $contigOBJ = Bio::Assembly::Contig->new(-id=>$1, -source=>'phrap');
+           $contigOBJ = Bio::Assembly::Contig->new(-id=>"Contig$1", -source=>'phrap');
            my $feat   = Bio::SeqFeature::Generic->new(-start=>1,
                                                       -end=>$length,
                                                       -primary=>"_main_contig_feature:".$contigOBJ->id(),

	Applying my patch to phrap.pm and Marc's patch to ace.pm 
everything is kept consistent and we get support for one more output
format, which is a Good Thing (tm) :).
						Robson