[Bioperl-l] UniGene modules now in CVS
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Wed, 01 May 2002 15:09:23 +0100
Andrew,
Great work!
I have note about the module names, though.
Bio::Cluster::Unigene should be one implementation of Bio::Cluster::ClusterI
interface.
Reading in files you need to say now:
$io = Bio::ClusteIO::UnigeneIO->new(-file=>'myfile',
-format=>'unigene');
$clu = $io->next_unigene();
which is somewhat redundant. A more generic way would be:
$io = Bio::ClusterIO::IO->new(-file=>'myfile',
-format=>'unigene');
$clu = $str->next();
... and $clu would be a Bio::Cluster::Unigene object.
Then when a Stack parser is written, you need to say only:
$str = Bio::ClusterIO::IO->new(-file=>'myfile',
-format=>'stack');
$clu = $str->next();
To get an other Bio::Cluster::ClusterI impementing object.
Bio::ClusterIO::IO could equally well be simply Bio::ClusterIO
and cluster.pm could one level higher.
Do let this to discourage you,
Your,
-heikki
Andrew Macgregor wrote:
>
> Hi All,
>
> I've just checked in the UniGene modules I've been working on. I've based
> the modules on and borrowed heavily from SeqIO and Seq. As I've said
> already, I'm new to this so I'm sure the coding won't be 100 percent
> perfect. That said I think they work alright.
>
> The parser is not super fast, especially as it parses out every sequence
> line, so be patient if you're doing, say, the entire human unigene file
> (overnight job for me). Once I've seen how Parse::FastDescent looks I'll
> probably move to that.
>
> I've pasted the synopsis for the modules below.
>
> Things I'm not so sure of:
> - I've made a test and a test data file but I'm not certain they are OK.
> - I'm not certain about error handling, if the parser spits an error it goes
> to STDERR, I'm not too sure what else I should have.
> - At the moment the modules only work with the *.data unigene files from
> NCBI. I could add further format modules as need arises (i.e. for *.seq.uniq
> etc)
> - The whole interface thing I am not too sure of, there's only something
> very basic there at present (UniGeneI.pm).
> - other things that I've no doubt overlooked.
>
> Any feedback on these things is appreciated.
>
> Cheers, Andrew.
>
> NAME
> Bio::ClusterIO::UniGeneIO - Handler for UniGeneIO Formats
>
> SYNOPSIS
> use Bio::Cluster::UniGene;
> use Bio::ClusterIO::UniGeneIO;
>
> $stream = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
> '-format' => "unigene");
> # note: we quote -format to keep older perl's from complaining.
>
> while ( my $in = $stream->next_unigene() ) {
>
> print $in->unigene_id() . "\n";
>
> while ( my $sequence = $in->next_seq() ) {
> print $sequence->accession_number() . "\n";
> }
>
> Parsing errors are printed to STDERR.
>
> DESCRIPTION
> The UniGeneIO modules works with the unigene format module to read NCBI
> UniGene *.data files downloaded from
> ftp://ncbi.nlm.nih.gov/repository/UniGene/.
>
> CONSTRUCTORS
> Bio::ClusterIO::UniGeneIO->new()
>
> $unigeneIO = Bio::ClusterIO::UniGeneIO->new(-file => 'filename',
> -format=>$format);
>
> The new() class method constructs a new Bio::UniGeneIO object. The
> returned object can be used to retrieve or print UniGene objects. new()
> accepts the following parameters:
>
> -file
> A file path to be opened for reading.
>
> -format
> Specify the format of the file. Supported formats include:
>
> *.data UniGene build files.
>
> If no format is specified and a filename is given, then the module
> will attempt to deduce it from the filename. If this is
> unsuccessful, the main UniGene build format is assumed.
>
> The format name is case insensitive. 'UNIGENE', 'UniGene' and
> 'unigene' are all supported.
>
> NAME
> Bio::Cluster::UniGene - UniGene object
>
> SYNOPSIS
> use Bio::Cluster::UniGene;
> use Bio::ClusterIO::UniGeneIO;
>
> $stream = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
> '-format' => "unigene");
> # note: we quote -format to keep older perl's from complaining.
>
> while ( my $in = $stream->next_unigene() ) {
>
> print $in->unigene_id() . "\n";
>
> while ( my $sequence = $in->next_seq() ) {
> print $sequence->accession_number() . "\n";
> }
>
> DESCRIPTION
> This UniGene object is returned by UniGeneIO and contains all the data
> associated with one UniGene record.
>
> Available methods (see below for details):
>
> new() - standard new call
> unigene_id() - set/get
> unigene_id title() -
> set/get title (description)
> gene() - set/get gene
> cytoband() - set/get cytoband
> locuslink() - set/get locuslink
> gnm_terminus() - set/get gnm_terminus
> chromosome() - set/get chromosome
> scount() - set/get scount
> express() - set/get express, currently takes/returns a reference to an
> array of expressed tissues
> next_express() - returns the next tissue expression from the expressed
> tissue array
> sts() - set/get sts, currently takes/returns a reference to an array of
> sts lines next_sts()
> - returns the next sts line from the array of sts lines
> txmap() - set/get txmap, currently takes/returns a reference to an array
> of txmap
> lines
> next_txmap() - returns the next txmap line from the array of txmap
> lines
> protsim() - set/get protsim, currently takes/returns a reference
> to an array of protsim lines
> next_protsim() - returns the next protsim
> line from the array of protsim lines
> sequence() - set/get sequence, currently takes/returns a reference to an
> array of references to seq
> info
> next_seq() - returns a Seq object that currently only contains an
> accession number
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________