[Open-bio-l] [EMBOSS] Common Sample Data Collection, was: SCF files (Staden)

Chris Fields cjfields at illinois.edu
Thu Dec 15 19:07:44 UTC 2011


Hamish,

Reason I ask, the various Bio* and EMBOSS projects have a share of old 
(and possibly duplicate) data examples, but it might be nice to 
standardize on a common set of records, simply for less data duplication.

As an example, have a git repo of purely data or links to data that we 
could 'git submodule' in for code distribution, release, and testing 
purposes, but that wouldn't bloat the code repository.

chris

On 12/15/2011 12:01 PM, Hamish McWilliam wrote:
> Hi Chris,
>
>> That might be the best source to pull from.  Does it archive old file examples (such as older SwissProt/GenBank/EMBL)?
> EDAM itself does not store entry data, and at the moment it does not
> describe the changes to formats over time, although I'm sure this
> could be added along with links to sample entries in the various data
> archives.
>
> If you only need a few sample entries, see the appropriate database archive:
>
> - EMBL-Bank Sequence Version Archive (EMBL-SVA):
> http://www.ebi.ac.uk/cgi-bin/sva/sva.pl.
> E.g. http://www.ebi.ac.uk/cgi-bin/sva/sva.pl/?query=V00077&search=Go
> - UniProtKB Sequence/Annotation Version Archive (UniSave):
> http://www.ebi.ac.uk/uniprot/unisave/
> E.g. http://www.ebi.ac.uk/uniprot/unisave/?query=P00002&search=Go
> - NCBI Entrez Revision History.
> E.g. http://www.ncbi.nlm.nih.gov/nuccore/V00077?report=girevhist
>
> If you need more entries...
>
> For Swiss-PROT and UniProtKB old versions of the data are available on
> the FTP sites, for example from EMBL-EBI:
> - ftp://ftp.ebi.ac.uk/pub/databases/uniprot/previous_releases/
> - ftp://ftp.ebi.ac.uk/pub/databases/swissprot/sw_old_releases/
>
> For GenBank, Don Gilbert collected various old releases a while back
> (http://www.bio.net/bionet/mm/genbankb/2006-October/000251.html),
> these are available via the BioMirrors (http://www.bio-mirror.net/).
> NCBI may also be able to provide old releases on request.
>
> For EMBL-Bank old releases can be made available on request, contact
> ENA (http://www.ebi.ac.uk/ena/about/contact) for more information.
>
> All the best,
>
> Hamish
>
>> chris
>>
>> On Nov 30, 2011, at 8:49 AM, Peter Cock wrote:
>>
>>> I just checked with Jon and he was happy to forward this back to
>>> the list, and also added a couple of URLs that I'd asked about:
>>>
>>> http://bioportal.bioontology.org/ontologies/44600
>>> http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=EDAM
>>>
>>> Peter
>>>
>>> On Wed, Nov 30, 2011 at 11:14 AM, Jon Ison<jison at ebi.ac.uk>  wrote:
>>>> Hi Peter (and Peter)
>>>>
>>>> Just a quick note to say that all (well, nearly all) common bioinformatics data formats are
>>>> catalogued in the EDAM ontology:
>>>>
>>>> http://sourceforge.net/projects/edamontology/files
>>>> http://edamontology.sourceforge.net/
>>>>
>>>> OK - there's bound to be some we've missed :)
>>>>
>>>> Anyhow, I thought it might help to structure any effort to document data formats (an effort which
>>>> I wholeheartedly approve of by the way).  One thing I'd like to add to the EDAM "format"
>>>> definitions is a link to the format specification, or failing that, an example.
>>>>
>>>> Cheers both
>>>>
>>>> Jon
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l




More information about the Open-Bio-l mailing list