[Bioperl-l] UniGene modules now in CVS

Heikki Lehvaslaiho heikki@ebi.ac.uk
Wed, 01 May 2002 15:09:23 +0100


Andrew,

Great work!

I have note about the module names, though. 


Bio::Cluster::Unigene should be one implementation of Bio::Cluster::ClusterI
interface. 


Reading in files you need to say now:

$io = Bio::ClusteIO::UnigeneIO->new(-file=>'myfile',
                                     -format=>'unigene');
$clu = $io->next_unigene();


which is somewhat redundant. A more generic way would be:

$io = Bio::ClusterIO::IO->new(-file=>'myfile',
                               -format=>'unigene');
$clu = $str->next();

... and $clu would be a Bio::Cluster::Unigene object.

Then when a Stack parser is written, you need to say only:

$str = Bio::ClusterIO::IO->new(-file=>'myfile',
                               -format=>'stack');

$clu = $str->next();

To get an other Bio::Cluster::ClusterI impementing object.


Bio::ClusterIO::IO could equally well be simply Bio::ClusterIO
and cluster.pm could one level higher.


Do let this to discourage you,

Your,
	-heikki



Andrew Macgregor wrote:
> 
> Hi All,
> 
> I've just checked in the UniGene modules I've been working on. I've based
> the modules on and borrowed heavily from SeqIO and Seq. As I've said
> already, I'm new to this so I'm sure the coding won't be 100 percent
> perfect.  That said I think they work alright.
> 
> The parser is not super fast, especially as it parses out every sequence
> line, so be patient if you're doing, say, the entire human unigene file
> (overnight job for me).  Once I've seen how Parse::FastDescent looks I'll
> probably move to that.
> 
> I've pasted the synopsis for the modules below.
> 
> Things I'm not so sure of:
> - I've made a test and a test data file but I'm not certain they are OK.
> - I'm not certain about error handling, if the parser spits an error it goes
> to STDERR, I'm not too sure what else I should have.
> - At the moment the modules only work with the *.data unigene files from
> NCBI. I could add further format modules as need arises (i.e. for *.seq.uniq
> etc)
> - The whole interface thing I am not too sure of, there's only something
> very basic there at present (UniGeneI.pm).
> - other things that I've no doubt overlooked.
> 
> Any feedback on these things is appreciated.
> 
> Cheers, Andrew.
> 
> NAME
>     Bio::ClusterIO::UniGeneIO - Handler for UniGeneIO Formats
> 
> SYNOPSIS
>             use Bio::Cluster::UniGene;
>         use Bio::ClusterIO::UniGeneIO;
> 
>             $stream  = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
> '-format' => "unigene");
>         # note: we quote -format to keep older perl's from complaining.
> 
>             while ( my $in = $stream->next_unigene() ) {
> 
>                     print $in->unigene_id() . "\n";
> 
>                     while ( my $sequence = $in->next_seq() ) {
>                             print $sequence->accession_number() . "\n";
>                     }
> 
>             Parsing errors are printed to STDERR.
> 
> DESCRIPTION
>     The UniGeneIO modules works with the unigene format module to read NCBI
>     UniGene *.data files downloaded from
>     ftp://ncbi.nlm.nih.gov/repository/UniGene/.
> 
> CONSTRUCTORS
>   Bio::ClusterIO::UniGeneIO->new()
> 
>        $unigeneIO = Bio::ClusterIO::UniGeneIO->new(-file => 'filename',
> -format=>$format);
> 
>     The new() class method constructs a new Bio::UniGeneIO object. The
>     returned object can be used to retrieve or print UniGene objects. new()
>     accepts the following parameters:
> 
>     -file
>         A file path to be opened for reading.
> 
>     -format
>         Specify the format of the file. Supported formats include:
> 
>            *.data      UniGene build files.
> 
>         If no format is specified and a filename is given, then the module
>         will attempt to deduce it from the filename. If this is
>         unsuccessful, the main UniGene build format is assumed.
> 
>         The format name is case insensitive. 'UNIGENE', 'UniGene' and
>         'unigene' are all supported.
> 
> NAME
>     Bio::Cluster::UniGene - UniGene object
> 
> SYNOPSIS
>             use Bio::Cluster::UniGene;
>         use Bio::ClusterIO::UniGeneIO;
> 
>             $stream  = Bio::ClusterIO::UniGeneIO->new('-file' => "Hs.data",
> '-format' => "unigene");
>         # note: we quote -format to keep older perl's from complaining.
> 
>             while ( my $in = $stream->next_unigene() ) {
> 
>                     print $in->unigene_id() . "\n";
> 
>                     while ( my $sequence = $in->next_seq() ) {
>                             print $sequence->accession_number() . "\n";
>                     }
> 
> DESCRIPTION
>     This UniGene object is returned by UniGeneIO and contains all the data
>     associated with one UniGene record.
> 
>     Available methods (see below for details):
> 
>     new() - standard new call
>     unigene_id() - set/get
>     unigene_id title() -
>     set/get title (description)
>     gene() - set/get gene
>     cytoband() - set/get cytoband
>     locuslink() - set/get locuslink
>     gnm_terminus() - set/get gnm_terminus
>     chromosome() - set/get chromosome
>     scount() - set/get scount
>     express() - set/get express, currently takes/returns a reference to an
>     array of expressed tissues
>     next_express() - returns the next tissue expression from the expressed
> tissue array
>     sts() - set/get sts, currently takes/returns a reference to an array of
> sts lines next_sts()
>     - returns the next sts line from the array of sts lines
>     txmap() - set/get txmap, currently takes/returns a reference to an array
> of txmap
>     lines
>     next_txmap() - returns the next txmap line from the array of txmap
>     lines
>     protsim() - set/get protsim, currently takes/returns a reference
>     to an array of protsim lines
>     next_protsim() - returns the next protsim
>     line from the array of protsim lines
>     sequence() - set/get sequence, currently takes/returns a reference to an
> array of references to seq
>     info
>     next_seq() - returns a Seq object that currently only contains an
> accession number
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________