[Bioperl-l] Re: Fixing bioperl [was Re: Analysis features]

Hilmar Lapp hlapp at gnf.org
Fri Jul 29 20:20:19 EDT 2005


On Jul 29, 2005, at 8:17 AM, Scott Cain wrote:

>
> The main section of affected code in gmod is the GFF bulk loader, but
> after we make the changes to the bioperl API, it shouldn't be too hard
> to fix the loader.  In fact, some of those changes may have already
> started.  I remember a few weeks before I release the gmod/chado
> package, Hilmar sent out an announcement that he made some changes.

You mean around the time of ISMB? I fixed the ontology modules ... they  
should actually work better now not worse unless you assumed the  
presence of some bugs ;)

> While I should have paid attention then, I was busy getting my release
> together, and everything seemed to work, so I ignored it.
> Unfortunately, the reason things continued to work was that I forgot to
> update my bioperl-live, and as a result, the gmod release doesn't work
> with bioperl-live.

Scott, what would really help sometimes is if in such a situation you  
run the bioperl test suite and report the result if there are any  
failures, especially those that appear potentially connected to your  
problem. Last time the gmod ontology loader ceased to work the problem  
would have been readily exposed by the ontology tests in bioperl. It  
just helps in zooming in on the problem.

I'd be eager to help make bioperl work with gmod and vice versa and I'm  
sure many others are too, but it'll be difficult if we don't work  
towards this collaboratively. For this I really liked the spirit of  
Chris' proposal - that's the way to make this work.

> [...]
> The other section of code that could have been affected but won't be is
> the ontology loader.  The current ontology loader depends on
> Bio::Ontology, but I was already planning on migrating to go-perl for
> loading ontologies anyway, so that won't be a problem.

I'm closing in on the last bugs in the go-perl integration. It remains  
to be seen how fast the result is as Chris made me aware in Detroit,  
but if it works this will give you both worlds at your choosing.

	-hilmar

>
> So, who wants to take the lead on this?
>
> Thanks,
> Scott
>
>
> On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote:
>> I think the answer may be even more complicated than this.
>>
>> Lurkers and contributors to the bioperl mailing list may have noticed  
>> that
>> there has been some major obstacles in progressing lately,  
>> particularly in
>> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is  
>> a
>> developers release, though this is the one required by GMOD.
>>
>> My understanding is that this bottleneck can be traced back to  
>> changes in
>> the SeqFeature and Annotation model. These changes appear to be  
>> required
>> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
>> (which in turn is used by the GMOD bulk loader, which is the main  
>> reason
>> GMOD requires 1.5, I believe?). Unfortunately, these changes also  
>> break
>> existing code and have a severe negative impact on memory usage.
>>
>> Before advising Cyril and others to switch to BFIO::gff I think it's
>> important to make sure there is a clear path forward with bioperl. My
>> impression is that there is something of a stalemate here. The bioperl
>> developers would like to retract the aforementioned changes, but they
>> believe they cannot do this without breaking GMOD code.  They are also
>> extremely uncomfortable about leaving these changes in. Everyone  
>> gives up
>> and starts coding around bioperl.
>>
>> Here is why the changes were introduced:
>>
>> BioPerl has a 'scruffy' typing model, whereby feature types  
>> (primary_tag
>> in bioperl) and featureprop types (tags in bioperl) are labels or  
>> strings.
>> In contrast, Chado forces all types to be some class or relation in an
>> ontology.
>>
>> Now obviously I'm rather partial to the Chado model, but that doesn't  
>> mean
>> I think it should be forced upon bioperl. I often use bioperl in  
>> scruffy
>> mode (on scruffy data); or in some combination whereby I map the  
>> scruffy
>> types to ontologies in some non-bioperl code. When using bioperl as a
>> middleware component over a nicely organised database, ontology-typed  
>> mode
>> is definitely best. However, the majority of bioperl users (including
>> myself) spend a large proportion of their time working with scruffy  
>> data,
>> in which case lightweight scruffy types are more appropriate.
>>
>> It seems that there is a perfectly simple way of reconciling both
>> approaches. We revert bioperl back to the simpler scruffy model. The
>> majority of users and developers breathe a sigh of relief. We then  
>> extend
>> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces  
>> types to
>> be stored as OntologyTerms (and I haven't even touched on some of the
>> problems here, but at least we are insulating the standard bioperl  
>> layer
>> that 99% of users use from these issues). All classes implementing  
>> SFAI
>> will necessarily implement SFI, and the primary_tag and tag_values  
>> methods
>> will be supported (not deprecated) as simple delegations to the
>> OntologyTerm objects.
>>
>> We can then modify BFIO::gff (which is an incredibly useful piece of  
>> code)
>> and get rid of all the dependencies on SO and Bio::Ontology* and  
>> instead
>> allow the user of this module to plug in their own resolver/validator  
>> - so
>> they can choose whether they just want fast scruffy lightweight SFI
>> features, or whether they want ontology-typed SFAI features. If the
>> latter, then they can choose their own resolver strategy - by a user
>> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
>> local chado db, by the genbank->SO mapping table, during parsing vs
>> post-parsing, whatever. In fact there is already
>> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly  
>> concerned
>> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy  
>> genbank
>> to something sensible.
>>
>> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
>> simpler SFI. Someone can even get a stable 1.6 release out before all  
>> the
>> SFAI details such as how the resolver would work are finalised. I'd  
>> really
>> like to see 1.6 include a simpler BFIO::gff that can optionally  
>> produces
>> features that aren't SeqFeature::Annotateds, but that's negotiable.
>>
>> There's vast swathes of both GMOD and BioPerl code I'm not familiar  
>> with,
>> so it's possible my analysis above is flawed in some way. If it is,  
>> then
>> it's up to someone from either camp to speak up! If not, then there's  
>> no
>> excuses for the relevant people to start sorting out this mess by
>> commencing with the solution outlined above.
>>
>> Cheers
>> Chris
>>
>>>
>>> Scott
>>>
>>>
>>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
>>>> Hello,
>>>> We are going to store analysis results in chado, and we are of  
>>>> course
>>>> very interressed by these futur evolutions of GFF3/chado.
>>>> So we would like to make sure that the parsers and conversions  
>>>> programs
>>>> we are writing now will be compatible with the futur GFF3.
>>>>
>>>> We are using Bio::SeqFeature::Generic objects that we write with
>>>> Bio::Tools::GFF.
>>>>
>>>> Do you think that Bio::Tools::GFF will be able to handle the new  
>>>> 'type'
>>>> column or is it better to switch to Bio::FeatureIO::gff ?
>>>>
>>>> Thanks in advance for any advice.
>>>>
>>>> Cyril
>>>>
>>>> Don Gilbert wrote:
>>>>
>>>>>
>>>>> Scott,
>>>>>
>>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
>>>>> same direction I suggest below. More about these todo points
>>>>>
>>>>>> - address flybase"s use of of analysisfeature combined with  
>>>>>> feature to
>>>>>> give source-type information (in GFF terms). This will need to
>>>>>> be addressed in the GBrowse adaptor.
>>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is,
>>>>>> containing
>>>>>> both analysis results and annotations). See perldoc
>>>>>> gmod_bulk_load_gff3.pl
>>>>>> for more info
>>>>>
>>>>>
>>>>> Use of chado's analysisfeature table is something others who know
>>>>> it better can comment on. But after working with it for a while
>>>>> it makes sense to me to use in this way:
>>>>>
>>>>> For a future GFF -> Chado loader, treat analysis features such as
>>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather
>>>>> than feature CV term type (the ones that now end up with a generic
>>>>> 'match' cvterm). In these cases the Analysis table is populated  
>>>>> with
>>>>> program:database_sourcename
>>>>> as the basis of this 'analysisfeature type', such as
>>>>> match:blastx:na_pe.dros
>>>>> match:sim4:DGC
>>>>> match:genie:dummy (or maybe exon:genie)
>>>>>
>>>>> The program:database fits neatly in GFF source field, as
>>>>> #ref source type start stop ...
>>>>> chr1 blastx:na_pe.dros match 1 100 ...
>>>>> chr1 sim4:DGC match 1 100 ...
>>>>>
>>>>> These can be treated in database adaptor analogously to the CVterm
>>>>> table feature types. See at end a list of current GFF feature
>>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a
>>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
>>>>> BLAT:EMBL_BEST.
>>>>>
>>>>> From POD of your bulk_load_gff3.pl
>>>>>> Analysis
>>>>>> If you are loading analysis results (ie, BLAT results, gene
>>>>>> predictions), you should specify the -a flag. If no arguments are
>>>>>> supplied with the -a, then the loader will assume that the results
>>>>>> belong to an analysis set with a name that is the concatenation of
>>>>>> the source (column 2) and the method (column 3) with an underscore
>>>>>> in between.
>>>>>
>>>>> "... then the loader will assume that the results belong to an
>>>>> analysis table row with a program name and database source name
>>>>> taken from Source (column 2, colon separated program:sourcename),
>>>>> with a SOFA feature type taken from Method (column 3). If
>>>>> sourcename doesn't apply, e.g. genefinder, don't add or use  
>>>>> 'dummy'.
>>>>> Use the generic 'match' SOFA type if others don't apply."
>>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
>>>>>
>>>>> Note that sourcename of database is a common attribute (all those
>>>>> blasts, blats, sim4, ... are run on several different databases).
>>>>>
>>>>> For that underscore between method and source, where does that go  
>>>>> into
>>>>> database? It is used as parts of program or database sourcename  
>>>>> names,
>>>>> so it may be problematic to add one if not needed.
>>>>>
>>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name'  
>>>>> entry
>>>>> for analysis table. This probably is less useful than using Program
>>>>> and Sourcename fields as flybase does, which comes from the common
>>>>> usage where people run various programs, with various database  
>>>>> sources
>>>>> and want to plop the results into a database easily. These go into  
>>>>> those
>>>>> two fields directly, no need to create or parse a Name entry
>>>>> (which can be and is null in flybase data).
>>>>>
>>>>>> my $search_analysis
>>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
>>>>>
>>>>> I think it would be better as
>>>>> my $search_analysis
>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=?  
>>>>> and
>>>>> sourcename=?");
>>>>>
>>>>>> Otherwise, the argument provided with -a will be taken
>>>>>> as the name of the analysis set. Either way, the analysis set must
>>>>>> already be in the analysis table. The easist way to do this is to
>>>>>> insert it directly in the psql shell:
>>>>>>
>>>>>> INSERT INTO analysis (name, program, programversion)
>>>>>> VALUES ('genscan 2005-2-28','genscan','5.4');
>>>>>
>>>>> My choice would be to populate the analysis table from GFF data,  
>>>>> rather
>>>>> than expect prepraration by user (or as another option).
>>>>>
>>>>> INSERT INTO analysis (program, sourcename)
>>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
>>>>> INSERT INTO analysis (program, sourcename)
>>>>> VALUES ('sim4','na_gb.dmel');
>>>>> INSERT INTO analysis (program, sourcename, programversion)
>>>>> VALUES ('genie_masked','dummy', '1.0');
>>>>>
>>>>>> There are other columns in the analysis table that are optional;  
>>>>>> see
>>>>>> the schema documentation and '\d analysis' in psql for more
>>>>>> information.
>>>>>>
>>>>> ....
>>>>>> A planned addtion to the functionality of handling analysis  
>>>>>> results
>>>>>> is to allow "mixed" GFF files, where some lines are analysis  
>>>>>> results
>>>>>> and some are not.
>>>>>
>>>>> This is the case for drosophila GFF now (see others also below). If
>>>>> you make the default assumption that if ($method =~ /.*match/) and
>>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of
>>>>> analysisfeature types, and probably not anything else.
>>>>>
>>>>>> Additionally, one will be able to supply lists of
>>>>>> types (optionally with sources) and their associated entry in the
>>>>>> analysis table. The format will probably be tag value pairs:
>>>>>>
>>>>>> --analysis match:Rice_est=rice_est_blast, \
>>>>>> match:Maize_cDNA=maize_cdna_blast, \
>>>>>> mRNA=genscan_prediction,exon=genscan_prediction
>>>>>
>>>>> My suggestion for this (as per GFF source,type columns) would be
>>>>> --analysis match:program:sourcename ...
>>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
>>>>> mRNA:genscan:dummy, exon:genscan:dummy
>>>>>
>>>>> I guess the 'dummy' data sourcename need not be added; flybase  
>>>>> uses it
>>>>> to keep that field not-null, but it isn't required by the schema.
>>>>>
>>>>> Here are some snippets from the ChadoFC adaptor I modified
>>>>> from yours (will get into cvs.sf.net 'real soon'), showing that
>>>>> it isn't much work to add this as an analog to how cvterm types
>>>>> are used.
>>>>>
>>>>> -- Don
>>>>>
>>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
>>>>> ## treat similar to CV table types
>>>>>
>>>>> sub getAnalysisFeatureHash
>>>>> {
>>>>> my $self= shift;
>>>>>
>>>>> my $dbh= $self->dbh();
>>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from
>>>>> analysis")
>>>>> or warn "unable to prepare select cvterms";
>>>>> $sth->execute or $self->throw("unable to select cvterms");
>>>>>
>>>>> my(%term2name,%name2term) = ({},{});
>>>>>
>>>>> while (my $hashref = $sth->fetchrow_hashref) {
>>>>>
>>>>> ## this is dgg syntax of analysis feature names for GFF
>>>>> ## all have generic 'match' method and program:source as 'source'
>>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie ..  
>>>>> etc.
>>>>> my $anfeat=  
>>>>> "match:".$hashref->{program}.":".$hashref->{sourcename};
>>>>>
>>>>> $term2name{ $hashref->{analysis_id} } = $anfeat;
>>>>> $name2term{ $anfeat } = $hashref->{analysis_id};
>>>>> }
>>>>> $self->an_term2name(\%term2name);
>>>>> $self->an_name2term(\%name2term);
>>>>> }
>>>>>
>>>>> ## Das::ChadoFC::Segment snippets
>>>>> sub features {
>>>>> $self->{has_anatype}=0;
>>>>> my $sql_range = '';
>>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types);
>>>>> unless ($feature_id) {
>>>>> $sql_range = $self->sql_range($rangetype);
>>>>>
>>>>> $sql_types = $self->sql_types($types, -1); # dgg
>>>>>
>>>>> $srcfeature_id = $self->{srcfeature_id};
>>>>> }
>>>>> ...
>>>>> elsif($self->{has_anatype}) {
>>>>> $from_part .= "left join analysisfeature af using (feature_id) ";
>>>>> }
>>>>>
>>>>>
>>>>> sub sql_types
>>>>> ..
>>>>> $valid_type = $factory->name2term($temp_type);
>>>>> $is_anatype= 0;
>>>>> unless ($valid_type) {
>>>>> $valid_type = $factory->an_name2term($temp_type);
>>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
>>>>> }
>>>>> ..
>>>>> ## leave out extra invalid types
>>>>> if (!$valid_type) {
>>>>> ### skip
>>>>> } elsif ($temp_dbxref) {
>>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
>>>>> $temp_dbxref)";
>>>>> } elsif($is_anatype) {
>>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
>>>>> } else {
>>>>> $sql_types .= $orsql."(f.type_id = $valid_type)";
>>>>> }
>>>>>
>>>>>
>>>>> Lists of GFF feature type:source from some current MOD data
>>>>> where * are probably analysisfeature types (program:database)
>>>>>
>>>>> rice gff type:source
>>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ 
>>>>> sequence_annotation/
>>>>> gff3/
>>>>> --------------------
>>>>> CDS:known
>>>>> CDS:tigr
>>>>> EST:cmap
>>>>> EST_match:Barley (? might be EST_match:someprogram:Barley)
>>>>> EST_match:Maize
>>>>> EST_match:Millet
>>>>> EST_match:Rice
>>>>> EST_match:Sorghum
>>>>> EST_match:Wheat
>>>>> cDNA_match:Rice
>>>>> cross_genome_match:Maize
>>>>> cross_genome_match:Rice
>>>>> cross_genome_match:Sorghum
>>>>> * exon:FgenesH:Monocot
>>>>> exon:known
>>>>> exon:tigr
>>>>> five_prime_UTR:tigr
>>>>> gene:known
>>>>> gene:tigr
>>>>> * mRNA:FgenesH:Monocot
>>>>> mRNA:known
>>>>> mRNA:tigr
>>>>> microsatellite:cmap
>>>>> three_prime_UTR:known
>>>>> three_prime_UTR:tigr
>>>>> transposable_element_insertion_site:cmap
>>>>>
>>>>> worm gff type:source
>>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
>>>>> genome_feature_tables/GFF3/
>>>>> ----------------------
>>>>> CDS:Coding_transcript
>>>>> * CDS:Genefinder
>>>>> CDS:Transposon_CDS
>>>>> CDS:history
>>>>> * CDS:twinscan
>>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
>>>>> * EST_match:BLAT_EST_OTHER
>>>>> PCR_product:GenePair_STS
>>>>> PCR_product:Orfeome
>>>>> RNAi_reagent:RNAi_primary
>>>>> RNAi_reagent:RNAi_secondary
>>>>> SNP:Allele
>>>>> binding_site:binding_site
>>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
>>>>> * cDNA_match:BLAT_mRNA_OTHER
>>>>> clone_end:.
>>>>> clone_start:.
>>>>> complex_substitution :Allele
>>>>> deletion:Allele
>>>>> exon:Coding_transcript
>>>>> * exon:Genefinder
>>>>> exon:Non_coding_transcript
>>>>> exon:Pseudogene
>>>>> exon:Transposon_CDS
>>>>> exon:history
>>>>> exon:miRNA
>>>>> exon:rRNA
>>>>> exon:scRNA
>>>>> exon:snRNA
>>>>> exon:snoRNA
>>>>> exon:tRNA
>>>>> * exon:tRNAscan-SE-1.23
>>>>> * exon:twinscan
>>>>> experimental_result_region:Expr_profile
>>>>> experimental_result_region:cDNA_for_RNAi
>>>>> * expressed_sequence_match:BLAT_OST_BEST (~
>>>>> expressed_sequence_match:BLAT:OST_BEST )
>>>>> * expressed_sequence_match:BLAT_OST_OTHER
>>>>> five_prime_UTR:Coding_transcript
>>>>> gene:Coding_transcript
>>>>> gene:gene
>>>>> gene:history
>>>>> gene:landmark
>>>>> insertion:Allele
>>>>> inverted_repeat:inverted
>>>>> mRNA:Coding_transcript
>>>>> * mRNA:Genefinder
>>>>> mRNA:Transposon_CDS
>>>>> mRNA:history
>>>>> * mRNA:twinscan
>>>>> miRNA:miRNA
>>>>> nc_primary_transcript:Non_coding_transcript
>>>>> * nucleotide_match:BLAT_EMBL_BEST (~  
>>>>> nucleotide_match:BLAT:EMBL_BEST )
>>>>> * nucleotide_match:BLAT_EMBL_OTHER
>>>>> * nucleotide_match:BLAT_TC1_BEST
>>>>> * nucleotide_match:BLAT_TC1_OTHER
>>>>> * nucleotide_match:BLAT_ncRNA_BEST
>>>>> * nucleotide_match:BLAT_ncRNA_OTHER
>>>>> * nucleotide_match:TEC_RED
>>>>> * nucleotide_match:waba_coding
>>>>> * nucleotide_match:waba_strong
>>>>> * nucleotide_match:waba_weak
>>>>> oligo:.
>>>>> operon:operon
>>>>> polyA_signal_sequence:polyA_signal_sequence
>>>>> polyA_site:polyA_site
>>>>> processed_transcript:gene
>>>>> protein_coding_primary_transcript:Coding_transcript
>>>>> * protein_match:wublastx
>>>>> pseudogene:Pseudogene
>>>>> pseudogene:history
>>>>> rRNA:rRNA
>>>>> reagent:Oligo_set
>>>>> region:.
>>>>> region:Genbank
>>>>> region:Genomic_canonical
>>>>> region:Link
>>>>> * repeat_region:RepeatMasker
>>>>> scRNA:scRNA
>>>>> sequence_variant:.
>>>>> sequence_variant:Allele
>>>>> snRNA:snRNA
>>>>> snoRNA:snoRNA
>>>>> substitution:Allele
>>>>> tRNA:tRNA
>>>>> * tRNA:tRNAscan-SE-1.23
>>>>> tandem_repeat:tandem
>>>>> three_prime_UTR:Coding_transcript
>>>>> trans_splice_acceptor_site:SL1
>>>>> trans_splice_acceptor_site:SL2
>>>>> transcript:SAGE_transcript
>>>>> * translated_nucleotide_match:BLAT_NEMATODE (~
>>>>> translated_nucleotide_match:BLAT:NEMATODE )
>>>>> transposable_element:Transposon
>>>>> transposable_element:Transposon_CDS
>>>>> transposable_element_insertion_site:Allele
>>>>> transposable_element_insertion_site:Mos_insertion_allele
>>>>>
>>>>>
>>>>> fly gff type:source
>>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/
>>>>> -----------------------
>>>>> BAC:.
>>>>> CDS:.
>>>>> aberration_junction:.
>>>>> chromosome:.
>>>>> chromosome_arm:.
>>>>> chromosome_band:.
>>>>> enhancer:.
>>>>> exon:.
>>>>> five_prime_UTR:.
>>>>> gene:.
>>>>> insertion_site:.
>>>>> intron:.
>>>>> mRNA:.
>>>>> * match:RNAiHDP
>>>>> * match:assembly:path
>>>>> * match:blastx:aa_SPTR.dmel
>>>>> * match:blastx:aa_SPTR.insect
>>>>> * match:blastx:aa_SPTR.othinv
>>>>> * match:blastx:aa_SPTR.othvert
>>>>> * match:blastx:aa_SPTR.plant
>>>>> * match:blastx:aa_SPTR.primate
>>>>> * match:blastx:aa_SPTR.rodent
>>>>> * match:blastx:aa_SPTR.worm
>>>>> * match:blastx:aa_SPTR.yeast
>>>>> * match:genscan
>>>>> * match:repeatmasker
>>>>> * match:sim4:na_ARGs.dros
>>>>> * match:sim4:na_ARGsCDS.dros
>>>>> * match:sim4:na_DGC_dros
>>>>> * match:sim4:na_dbEST.diff.dmel
>>>>> * match:sim4:na_dbEST.same.dmel
>>>>> * match:sim4:na_gadfly_dmel_r2
>>>>> * match:sim4:na_gb.dmel
>>>>> * match:sim4:na_gb.tpa.dmel
>>>>> * match:sim4:na_smallRNA.dros
>>>>> * match:sim4:na_transcript_dmel_r31
>>>>> * match:sim4:na_transcript_dmel_r32
>>>>> * match:tRNAscan-SE:.
>>>>> * match:tblastx:na_agambiae
>>>>> * match:tblastx:na_dbEST.insect
>>>>> * match:tblastx:na_dpse
>>>>> * match_part:RNAiHDP
>>>>> * match_part:assembly:path
>>>>> * match_part:blastx:aa_SPTR.dmel
>>>>> * match_part:blastx:aa_SPTR.insect
>>>>> * match_part:blastx:aa_SPTR.othinv
>>>>> * match_part:blastx:aa_SPTR.othvert
>>>>> * match_part:blastx:aa_SPTR.plant
>>>>> * match_part:blastx:aa_SPTR.primate
>>>>> * match_part:blastx:aa_SPTR.rodent
>>>>> * match_part:blastx:aa_SPTR.worm
>>>>> * match_part:blastx:aa_SPTR.yeast
>>>>> * match_part:genscan
>>>>> * match_part:repeatmasker
>>>>> * match_part:sim4:na_ARGs.dros
>>>>> * match_part:sim4:na_ARGsCDS.dros
>>>>> * match_part:sim4:na_DGC_dros
>>>>> * match_part:sim4:na_dbEST.diff.dmel
>>>>> * match_part:sim4:na_dbEST.same.dmel
>>>>> * match_part:sim4:na_gadfly_dmel_r2
>>>>> * match_part:sim4:na_gb.dmel
>>>>> * match_part:sim4:na_gb.tpa.dmel
>>>>> * match_part:sim4:na_smallRNA.dros
>>>>> * match_part:sim4:na_transcript_dmel_r31
>>>>> * match_part:sim4:na_transcript_dmel_r32
>>>>> * match_part:tRNAscan-SE:.
>>>>> * match_part:tblastx:na_agambiae
>>>>> * match_part:tblastx:na_dbEST.insect
>>>>> * match_part:tblastx:na_dpse
>>>>> mature_peptide:.
>>>>> ncRNA:.
>>>>> oligo:.
>>>>> point_mutation:.
>>>>> polyA_site:.
>>>>> protein_binding_site:.
>>>>> pseudogene:.
>>>>> region:.
>>>>> regulatory_region:.
>>>>> rescue_fragment:.
>>>>> scaffold:.
>>>>> sequence_variant:.
>>>>> snRNA:.
>>>>> snoRNA:.
>>>>> tRNA:.
>>>>> three_prime_UTR:.
>>>>> transcription_start_site:.
>>>>> transposable_element:.
>>>>> transposable_element_insertion_site:. 3116
>>>>>
>>>>>
>>>>> yeast gff type:source count
>>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
>>>>> chromosomal_feature/saccharomyces_cerevisiae.gff
>>>>> -------------------------
>>>>> ARS:SGD
>>>>> CDS:SGD
>>>>> binding_site:SGD
>>>>> centromere:SGD
>>>>> chromosome:SGD
>>>>> gene:SGD
>>>>> insertion:SGD
>>>>> intron:SGD
>>>>> ncRNA:SGD
>>>>> nc_primary_transcript:SGD
>>>>> nucleotide_match:SGD
>>>>> pseudogene:SGD
>>>>> rRNA:SGD
>>>>> region:SGD
>>>>> region:landmark
>>>>> repeat_family:SGD
>>>>> repeat_region:SGD
>>>>> snRNA:SGD
>>>>> snoRNA:SGD
>>>>> tRNA:SGD
>>>>> telomere:SGD
>>>>> transposable_element:SGD
>>>>> transposable_element_gene:SGD
>>>>>
>>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
>>>>> -- gilbertd at indiana.edu -- http://marmot.bio.indiana.edu/
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------
>>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar
>>>>> happening
>>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest  
>>>>> in dual
>>>>> core and dual graphics technology at this free one hour event  
>>>>> hosted
>>>>> by HP, AMD, and NVIDIA. To register visit
>>>>> http://www.hp.com/go/dualwebinar
>>>>> _______________________________________________
>>>>> Gmod-gbrowse mailing list
>>>>> Gmod-gbrowse at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>>
>>>>
>>>>
>>> --
>>> --------------------------------------------------------------------- 
>>> ---
>>> Scott Cain, Ph. D.                                          
>>> cain at cshl.edu
>>> GMOD Coordinator (http://www.gmod.org/)                      
>>> 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>>
>>>
>>> -------------------------------------------------------
>>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
>>> September
>>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>>> Agile & Plan-Driven Development * Managing Projects & Teams *  
>>> Testing & QA
>>> Security * Process Improvement & Measurement *  
>>> http://www.sqe.com/bsce5sf
>>> _______________________________________________
>>> Gmod-devel mailing list
>>> Gmod-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
>>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
>> September
>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>> Agile & Plan-Driven Development * Managing Projects & Teams * Testing  
>> & QA
>> Security * Process Improvement & Measurement *  
>> http://www.sqe.com/bsce5sf
>> _______________________________________________
>> Gmod-devel mailing list
>> Gmod-devel at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
> -- 
> ----------------------------------------------------------------------- 
> -
> Scott Cain, Ph. D.                                          
> cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                      
> 216-392-3087
> Cold Spring Harbor Laboratory
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list