Bioperl: NCBI Entrez queries and Perl file handling
Matthew Pocock
mrp@sanger.ac.uk
Wed, 02 Jun 1999 16:04:50 +0100
Dear Simon,
Simon Twigger wrote:
>
> Hi there,
>
> <snip/>
>
> When perl reads in a file say using the normal code such as:
>
> open (FILE, "Hs.seq.uniq") or die "Cant open file: $!";
>
> while (<FILE>) {
> # deal with each line as it comes through
> # for example, to look for a specific Unigene ID
> if( /Hs.12345/) {
> # deal with the unigene information
> }
> }
>
> close FILE;
>
> does it keep the whole thing in memory as it reads through the file or
> does it just keep the current line (in $_) in memory? If its the former
> then Im not sure if reading in a 60Mb file is a good thing, if its the
> latter, then file size shouldnt have too many adverse effects other than
> taking a while to go through the whole thing.
It should only keep $_ in memory. Of course, if you stoore this string
anywhere then it will be kept around. If you said @lines = <FH> then
@lines would contain one string for each line in the file so it would
hold 60mb. That is why line-by-line file parsing is good.
>
> I also thought of trying to grep out the sequence rather than going all
> the way through the file sequentially as this seems pretty fast from the
> command line.
You should try the excelent indexer modules. I don't know if there is a
Bio::Index::* module that will index your file type - there is one for
fasta - but I guess you can roll your own from Bio::Index::Abstract. The
fasta implementation is very cool.
>
> Any suggestions on efficient ways to pull data out of large flat files
> like this?
>
> Thanks for any help you can give me!
>
> Simon.
>
> --
> --------------------------------------------------
> Simon Twigger, Ph.D.
> Laboratory for Genetic Research,
> Cardiovascular Research Center,
> Medical College of Wisconsin,
> 8701 Watertown Plank Road,
> Milwaukee, WI, 53226
>
> http://legba.ifrc.mcw.edu/~simont/
>
> tel. 414-456-4409 fax. 414-456-6516
> --------------------------------------------------
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================