[BioPython] Big GenBank files

Mon Apr 25 05:42:29 EDT 2005

Hi !

I am trying to make a program that do automatically blasts of a base of
sequences against the genbank sequences. And I would like to retrieve (also
automatically) the most interesting GenBank files..... to keep informations
about them in my database.

But I've got a problem (again..sorry ! :'( ) :

I've 2*512 Mega of RAM but it seems that my computer can't deal with 'big'
GenBank files like 'BA000028.3'(7 M) or 'AP008212' (37 M)

for example :
fichier = open('AP008212.fasta',"w")
record_parser = GenBank.RecordParser()
ncbi_dict = GenBank.NCBIDictionary ('nucleotide','genbank',parser=record_parser)
gb_record = ncbi_dict['AP008212']
fichier.close()

...never ends...
I suppose it is because the files are to big for the algo of the transformation
in registry....
For 'AP008212' (37 M) :
ncbi_dict = GenBank.NCBIDictionary ('nucleotide','fasta')
doesn't works either...

I tried to understand how all this works to try to retrieve the header of the
connexion (maybe there is a possibility of give up the download of these big
files...) but I am not very used to python and to all that concern
connexions...
I have been on this problem for 3 days... and I am lost...
I don't known what to do...
Could someone help me ?!

Thanks !
Aurelie