[Bioperl-l] UniGene modules now in CVS

Lincoln Stein lstein@cshl.org
Wed, 1 May 2002 23:21:03 -0400


Heikki Lehvaslaiho writes:
 > 
 > Andrew,
 > 
 > Great work!
 > 
 > I have note about the module names, though. 
 > 
 > 
 > Bio::Cluster::Unigene should be one implementation of Bio::Cluster::ClusterI
 > interface. 
 > 
 > 
 > Reading in files you need to say now:
 > 
 > $io = Bio::ClusteIO::UnigeneIO->new(-file=>'myfile',
 >                                      -format=>'unigene');
 > $clu = $io->next_unigene();
 > 
 > 
 > which is somewhat redundant. A more generic way would be:
 > 
 > $io = Bio::ClusterIO::IO->new(-file=>'myfile',
 >                                -format=>'unigene');
 > $clu = $str->next();

Yes, but Bio::SeqIO uses next_seq().

 > ... and $clu would be a Bio::Cluster::Unigene object.
 > 
 > Then when a Stack parser is written, you need to say only:
 > 
 > $str = Bio::ClusterIO::IO->new(-file=>'myfile',
 >                                -format=>'stack');
 > 
 > $clu = $str->next();
 > 
 > To get an other Bio::Cluster::ClusterI impementing object.
 > 
 > 
 > Bio::ClusterIO::IO could equally well be simply Bio::ClusterIO
 > and cluster.pm could one level higher.
 > 
 > 
 > Do let this to discourage you,
 > 
 > Your,
 > 	-heikki
 > 
 > 
 > 
 > Andrew Macgregor wrote:
 > > 
 > > Hi All,
 > > 
 > > I've just checked in the UniGene modules I've been working on. I've based
 > > the modules on and borrowed heavily from SeqIO and Seq. As I've said
 > > already, I'm new to this so I'm sure the coding won't be 100 percent
 > > perfect.  That said I think they work alright.
 > > 
 > > The parser is not super fast, especially as it parses out every sequence
 > > line, so be patient if you're doing, say, the entire human unigene file
 > > (overnight job for me).  Once I've seen how Parse::FastDescent looks I'll
 > > probably move to that.
 > > 
 > > I've pasted the synopsis for the modules below.
 > > 
 > > Things I'm not so sure of:
 > > - I've made a test and a test data file but I'm not certain they are OK.
 > > - I'm not certain about error handling, if the parser spits an error it goes
 > > to STDERR, I'm not too sure what else I should have.
 > > - At the moment the modules only work with the *.data unigene files from
 > > NCBI. I could add further format modules as need arises (i.e. for *.seq.uniq
 > > etc)
 > > - The whole interface thing I am not too sure of, there's only something
 > > very basic there at present (UniGeneI.pm).
 > > - other things that I've no doubt overlooked.
 > > 
 > > Any feedback on these things is appreciated.
 > > 
 > > Cheers, Andrew.
 > > 
 > > NAME
 > >     Bio::ClusterIO::UniGeneIO - Handler for UniGeneIO Formats
 > > 
 > > SYNOPSIS
 > >             use Bio::Cluster::UniGene;
 > >         use Bio::ClusterIO::UniGeneIO;
 > > 
 > >             $stream  = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
 > > '-format' => "unigene");
 > >         # note: we quote -format to keep older perl's from complaining.
 > > 
 > >             while ( my $in = $stream->next_unigene() ) {
 > > 
 > >                     print $in->unigene_id() . "\n";
 > > 
 > >                     while ( my $sequence = $in->next_seq() ) {
 > >                             print $sequence->accession_number() . "\n";
 > >                     }
 > > 
 > >             Parsing errors are printed to STDERR.
 > > 
 > > DESCRIPTION
 > >     The UniGeneIO modules works with the unigene format module to read NCBI
 > >     UniGene *.data files downloaded from
 > >     ftp://ncbi.nlm.nih.gov/repository/UniGene/.
 > > 
 > > CONSTRUCTORS
 > >   Bio::ClusterIO::UniGeneIO->new()
 > > 
 > >        $unigeneIO = Bio::ClusterIO::UniGeneIO->new(-file => 'filename',
 > > -format=>$format);
 > > 
 > >     The new() class method constructs a new Bio::UniGeneIO object. The
 > >     returned object can be used to retrieve or print UniGene objects. new()
 > >     accepts the following parameters:
 > > 
 > >     -file
 > >         A file path to be opened for reading.
 > > 
 > >     -format
 > >         Specify the format of the file. Supported formats include:
 > > 
 > >            *.data      UniGene build files.
 > > 
 > >         If no format is specified and a filename is given, then the module
 > >         will attempt to deduce it from the filename. If this is
 > >         unsuccessful, the main UniGene build format is assumed.
 > > 
 > >         The format name is case insensitive. 'UNIGENE', 'UniGene' and
 > >         'unigene' are all supported.
 > > 
 > > NAME
 > >     Bio::Cluster::UniGene - UniGene object
 > > 
 > > SYNOPSIS
 > >             use Bio::Cluster::UniGene;
 > >         use Bio::ClusterIO::UniGeneIO;
 > > 
 > >             $stream  = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
 > > '-format' => "unigene");
 > >         # note: we quote -format to keep older perl's from complaining.
 > > 
 > >             while ( my $in = $stream->next_unigene() ) {
 > > 
 > >                     print $in->unigene_id() . "\n";
 > > 
 > >                     while ( my $sequence = $in->next_seq() ) {
 > >                             print $sequence->accession_number() . "\n";
 > >                     }
 > > 
 > > DESCRIPTION
 > >     This UniGene object is returned by UniGeneIO and contains all the data
 > >     associated with one UniGene record.
 > > 
 > >     Available methods (see below for details):
 > > 
 > >     new() - standard new call
 > >     unigene_id() - set/get
 > >     unigene_id title() -
 > >     set/get title (description)
 > >     gene() - set/get gene
 > >     cytoband() - set/get cytoband
 > >     locuslink() - set/get locuslink
 > >     gnm_terminus() - set/get gnm_terminus
 > >     chromosome() - set/get chromosome
 > >     scount() - set/get scount
 > >     express() - set/get express, currently takes/returns a reference to an
 > >     array of expressed tissues
 > >     next_express() - returns the next tissue expression from the expressed
 > > tissue array
 > >     sts() - set/get sts, currently takes/returns a reference to an array of
 > > sts lines next_sts()
 > >     - returns the next sts line from the array of sts lines
 > >     txmap() - set/get txmap, currently takes/returns a reference to an array
 > > of txmap
 > >     lines
 > >     next_txmap() - returns the next txmap line from the array of txmap
 > >     lines
 > >     protsim() - set/get protsim, currently takes/returns a reference
 > >     to an array of protsim lines
 > >     next_protsim() - returns the next protsim
 > >     line from the array of protsim lines
 > >     sequence() - set/get sequence, currently takes/returns a reference to an
 > > array of references to seq
 > >     info
 > >     next_seq() - returns a Seq object that currently only contains an
 > > accession number
 > > 
 > > _______________________________________________
 > > Bioperl-l mailing list
 > > Bioperl-l@bioperl.org
 > > http://bioperl.org/mailman/listinfo/bioperl-l
 > 
 > -- 
 > ______ _/      _/_____________________________________________________
 >       _/      _/                      http://www.ebi.ac.uk/mutations/
 >      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
 >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
 >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
 >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
 >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
 > ___ _/_/_/_/_/________________________________________________________
 > _______________________________________________
 > Bioperl-l mailing list
 > Bioperl-l@bioperl.org
 > http://bioperl.org/mailman/listinfo/bioperl-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
Positions available at my lab: see http://stein.cshl.org/#hire
========================================================================