[Bioperl-l] Bio::Assembly::IO::phrap and Bio::Assembly::IO::ace with large files

Sofia Robb sofia at neuro.utah.edu
Thu Feb 9 18:00:05 UTC 2006


I am having trouble parsing large (2030 contigs) phrap.out and ace.1 
files.  I have no problem with a small files (1 contig).  Here are the 
errors I get when try the code that is at the end of my email.  My 
script fails on this line:  my $assembly = $in->next_assembly;  I think 
it may be something to do with BTREE in Collection.pm, but have been 
unable to correct my errors.

-------

file with 2030 contigs
Bio::Assembly::IO::ace
Can't call method "get_dup" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 359, <GEN0> line 
17699.

line 17699 of my ace file is the last line of the record for Contig253

------

file with 2030 contigs
Bio::Assembly::IO::phrap
Can't call method "put" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 225, <GEN0> line 
39839. 

line 39839 of my phrap.out file is first line of the record for Contig253

------

use Bio::Assembly::IO;

my $filename = $ARGV[0];

my $in = Bio::Assembly::IO->new(-file=>"$filename",
                                -format=>"phrap"    #or -format=>"ace" 
for ace.1 files
                                );
my $assembly = $in->next_assembly;
my @contigs = $assembly->all_contigs();
foreach my $contig ($assembly->all_contigs){
        my $id = $contig->id();
        print "contig id = $id ";
        my $seqObj = $contig->get_consensus_sequence();
        my $seq = $seqObj->seq();
        print "is $seq\n";
}
my $id = $assembly->id();
print "$id\n";       

-----

Thanks for any input,
Sofia

Sofia Robb
Molecular Biology Ph.D Program
Sanchez Laboratory
Department of Neurobiology and Anatomy
University of Utah
http://planaria.neuro.utah.edu






More information about the Bioperl-l mailing list