[Bioperl-l] Query Unigene title from input a ACC number / BioPerl
Object Creation
Jamie Hatfield (AGCoL)
jamie at genome.arizona.edu
Tue Mar 25 09:21:09 EST 2003
Maybe it's just me, but I've never been too pleased with BioPerl's
ability to handle large amounts of data like these unigene clusters.
You all might remember I recently proposed a FPC module for reading in
FPC data files. Well, that is still in progress, but it is DOG slow,
and the only reason I can seem to make out of it is that object creation
is a bear.
I would really like some input myself, from the BioPerl experts about
what I can do to speed up the creation of say . . . 100k objects? :-)
But, back to this question. Yes, it will take forever + 1 day. You
might consider this perl script instead. It's pretty zippy.
==============================
#!/usr/local/bin/perl -w
my @query = qq{BG618921};
my $title;
my %lookup;
if ($#ARGV >= 0) {
## if there are arguments on the command line, use them as input
@query = @ARGV;
}
## initialize a lookup HASH so that all values in the query are
## key entries with value of 1
@lookup{@query} = (1) x @query;
while (<STDIN>) {
$title = $1 if (/^TITLE\s+(.*)/); ## remember the title for
later
if (/^SEQUENCE.+ACC=(\w+);/) {
print "$1\t$title\n" if ($lookup{$1}); ## print out the title if it
matched
}
}
============================
----------------------------------------------------------------------
Jamie Hatfield Room 541H, Marley Building
Systems Programmer University of Arizona
Arizona Genomics Computational Tucson, AZ 85721
Laboratory (AGCoL) (520) 626-9598
> -----Original Message-----
> From: bioperl-l-bounces at bioperl.org
> [mailto:bioperl-l-bounces at bioperl.org] On Behalf Of darson
> Sent: Tuesday, March 25, 2003 12:39 AM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Query Unigene title from input a ACC number
>
>
> Hello,
>
> I'm trying to write a script to grab Unigene title from a
> Hs.data file by
> input a ACC number,
> The following script is premature test,
>
> use Bio::Cluster::UniGene; use Bio::ClusterIO; use Bio::ClusterI;
> $stream=Bio::ClusterIO->new('-file'=>"/home/human_unigene/Hs.data", #
> location of human unigene file from NCBI FTP
>
> '-format'=>"unigene");
> while (my $in=$stream->next_cluster()){
> while (my $sequence=$in->next_seq()){
> if ($sequence->accession_number()=~/BG618921/){
> #BG618921 is a ACC
> member of Hs.107 fibrinogen-like 1
> print $hitid=$in->unigene_id()."\n";
> print $hitti=$in->title()."\n";
> }
> }
> }
>
> It can report the correct one, however this script spents
> over 1 hour and
> more to accomplish. That's extremely low efficiency.
> Furthermore I have
> thousands to do. I would be very appreciative if any
> suggestions or other
> methods to solve my problems. Thanks!
> Best regards,
> Darson
> Chung 2003/03/25
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list