[Bioperl-l] Out of memory errors running Bio::ASN1::EntrezGeneagainst latest Homo_sapiens.ags file
Mingyi Liu
mingyi.liu at gpc-biotech.com
Sun Oct 14 03:28:09 UTC 2007
Hi, Susan,
Let us know how my suggestions worked for you. I replied to both you
and the bioperl mailing list last Friday in the hope that my answer
could be helpful for the list discussion, but it seems that the mailing
list server had serious problems and dropped both of my emails. I'm
therefore replying again and combined the content of my 2 emails
together below. Hopefully the email gets sent out to the mailing list.
If not, would one of you please forward it out? Thanks.
Mingyi Liu wrote:
> Hi, Susan,
>
> Mauricio is right. When there's a problem with Bio::ASN1::EntrezGene,
> it's better to directly contact me. I actually deleted a few messages
> of this discussion before one caught my eye. Nowadays I'm working in
> some other areas and not tracking bioperl mailing list closely, a
> direct email to me would usually work out better.
>
> As for the problem you mentioned, there could be two reasons: 1. It
> seems that you converted the file to XML file instead of ASN file. My
> parser is designed for ASN file, so please use gene2xml to convert the
> downloaded file to ASN file instead of XML file. It is likely the
> wrong syntax of the file caused my parser to attempt to read the
> entire file as a string (because it couldn't find the start/end).
> However, there's another minor possibility (which you might have taken
> care of already): 2. Perl 5.8 added 64 bit support, but I don't know
> if you have perl 5.8 64 bit installed on your system to support the
> 256 GB system memory you have? If not, your >5 GB file is over the 4
> GB 32 bit Perl limit.
>
> Let me know if my suggestions work out for you.
>
> Best,
>
> Mingyi
>
BTW, here's the syntax in one of my messages last year about how to
convert the compressed binary ASN format NCBI provides to the text ASN
format my module (or Stefan's SeqIO::entrezgene) expects (the -x switch
does the trick, overwriting the default option to produce XML output):
my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i
Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped
binary file directly downloaded from NCBI
Same syntax should be used when you're using SeqIO (thus
SeqIO::entrezgene).
BTW, text ASN is both smaller and faster to parse than XML format.
Best,
Mingyi
> Susan Wilson wrote:
>> Hi,
>>
>> I downloaded the latest ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
>> ASN_BINARY/Mammalia/Homo_sapiens.ags.gz and ran gene2xml on it to
>> generate Homo_sapiens.xml which is 5821420628 bytes. I cannot parse
>> this file with Bio::ASN1::EntrezGene, even on a machine with 256GB
>> of memory. I get a simple "Out of memory" output even with the
>> following code:
>>
>> #!/usr/bin/perl
>> use strict;
>> use Bio::ASN1::EntrezGene;
>> my $parser = Bio::ASN1::EntrezGene->new('file' =>
>> "Homo_sapiens.xml");
>> while(my $result = $parser->next_seq)
>> {
>> }
>>
>>
>>
>> Thanks.
>> Susan
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
More information about the Bioperl-l
mailing list