[Bioperl-l] UniGene modules now in CVS
Lincoln Stein
lstein@cshl.org
Wed, 1 May 2002 23:21:03 -0400
Heikki Lehvaslaiho writes:
>
> Andrew,
>
> Great work!
>
> I have note about the module names, though.
>
>
> Bio::Cluster::Unigene should be one implementation of Bio::Cluster::ClusterI
> interface.
>
>
> Reading in files you need to say now:
>
> $io = Bio::ClusteIO::UnigeneIO->new(-file=>'myfile',
> -format=>'unigene');
> $clu = $io->next_unigene();
>
>
> which is somewhat redundant. A more generic way would be:
>
> $io = Bio::ClusterIO::IO->new(-file=>'myfile',
> -format=>'unigene');
> $clu = $str->next();
Yes, but Bio::SeqIO uses next_seq().
> ... and $clu would be a Bio::Cluster::Unigene object.
>
> Then when a Stack parser is written, you need to say only:
>
> $str = Bio::ClusterIO::IO->new(-file=>'myfile',
> -format=>'stack');
>
> $clu = $str->next();
>
> To get an other Bio::Cluster::ClusterI impementing object.
>
>
> Bio::ClusterIO::IO could equally well be simply Bio::ClusterIO
> and cluster.pm could one level higher.
>
>
> Do let this to discourage you,
>
> Your,
> -heikki
>
>
>
> Andrew Macgregor wrote:
> >
> > Hi All,
> >
> > I've just checked in the UniGene modules I've been working on. I've based
> > the modules on and borrowed heavily from SeqIO and Seq. As I've said
> > already, I'm new to this so I'm sure the coding won't be 100 percent
> > perfect. That said I think they work alright.
> >
> > The parser is not super fast, especially as it parses out every sequence
> > line, so be patient if you're doing, say, the entire human unigene file
> > (overnight job for me). Once I've seen how Parse::FastDescent looks I'll
> > probably move to that.
> >
> > I've pasted the synopsis for the modules below.
> >
> > Things I'm not so sure of:
> > - I've made a test and a test data file but I'm not certain they are OK.
> > - I'm not certain about error handling, if the parser spits an error it goes
> > to STDERR, I'm not too sure what else I should have.
> > - At the moment the modules only work with the *.data unigene files from
> > NCBI. I could add further format modules as need arises (i.e. for *.seq.uniq
> > etc)
> > - The whole interface thing I am not too sure of, there's only something
> > very basic there at present (UniGeneI.pm).
> > - other things that I've no doubt overlooked.
> >
> > Any feedback on these things is appreciated.
> >
> > Cheers, Andrew.
> >
> > NAME
> > Bio::ClusterIO::UniGeneIO - Handler for UniGeneIO Formats
> >
> > SYNOPSIS
> > use Bio::Cluster::UniGene;
> > use Bio::ClusterIO::UniGeneIO;
> >
> > $stream = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
> > '-format' => "unigene");
> > # note: we quote -format to keep older perl's from complaining.
> >
> > while ( my $in = $stream->next_unigene() ) {
> >
> > print $in->unigene_id() . "\n";
> >
> > while ( my $sequence = $in->next_seq() ) {
> > print $sequence->accession_number() . "\n";
> > }
> >
> > Parsing errors are printed to STDERR.
> >
> > DESCRIPTION
> > The UniGeneIO modules works with the unigene format module to read NCBI
> > UniGene *.data files downloaded from
> > ftp://ncbi.nlm.nih.gov/repository/UniGene/.
> >
> > CONSTRUCTORS
> > Bio::ClusterIO::UniGeneIO->new()
> >
> > $unigeneIO = Bio::ClusterIO::UniGeneIO->new(-file => 'filename',
> > -format=>$format);
> >
> > The new() class method constructs a new Bio::UniGeneIO object. The
> > returned object can be used to retrieve or print UniGene objects. new()
> > accepts the following parameters:
> >
> > -file
> > A file path to be opened for reading.
> >
> > -format
> > Specify the format of the file. Supported formats include:
> >
> > *.data UniGene build files.
> >
> > If no format is specified and a filename is given, then the module
> > will attempt to deduce it from the filename. If this is
> > unsuccessful, the main UniGene build format is assumed.
> >
> > The format name is case insensitive. 'UNIGENE', 'UniGene' and
> > 'unigene' are all supported.
> >
> > NAME
> > Bio::Cluster::UniGene - UniGene object
> >
> > SYNOPSIS
> > use Bio::Cluster::UniGene;
> > use Bio::ClusterIO::UniGeneIO;
> >
> > $stream = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
> > '-format' => "unigene");
> > # note: we quote -format to keep older perl's from complaining.
> >
> > while ( my $in = $stream->next_unigene() ) {
> >
> > print $in->unigene_id() . "\n";
> >
> > while ( my $sequence = $in->next_seq() ) {
> > print $sequence->accession_number() . "\n";
> > }
> >
> > DESCRIPTION
> > This UniGene object is returned by UniGeneIO and contains all the data
> > associated with one UniGene record.
> >
> > Available methods (see below for details):
> >
> > new() - standard new call
> > unigene_id() - set/get
> > unigene_id title() -
> > set/get title (description)
> > gene() - set/get gene
> > cytoband() - set/get cytoband
> > locuslink() - set/get locuslink
> > gnm_terminus() - set/get gnm_terminus
> > chromosome() - set/get chromosome
> > scount() - set/get scount
> > express() - set/get express, currently takes/returns a reference to an
> > array of expressed tissues
> > next_express() - returns the next tissue expression from the expressed
> > tissue array
> > sts() - set/get sts, currently takes/returns a reference to an array of
> > sts lines next_sts()
> > - returns the next sts line from the array of sts lines
> > txmap() - set/get txmap, currently takes/returns a reference to an array
> > of txmap
> > lines
> > next_txmap() - returns the next txmap line from the array of txmap
> > lines
> > protsim() - set/get protsim, currently takes/returns a reference
> > to an array of protsim lines
> > next_protsim() - returns the next protsim
> > line from the array of protsim lines
> > sequence() - set/get sequence, currently takes/returns a reference to an
> > array of references to seq
> > info
> > next_seq() - returns a Seq object that currently only contains an
> > accession number
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
>
> --
> ______ _/ _/_____________________________________________________
> _/ _/ http://www.ebi.ac.uk/mutations/
> _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
> _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
> _/ _/ _/ Wellcome Trust Genome Campus, Hinxton
> _/ _/ _/ Cambs. CB10 1SD, United Kingdom
> _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
Positions available at my lab: see http://stein.cshl.org/#hire
========================================================================