[BioPython] Bio.Geo for NCBI's GEO microarray SOFT files

Peter biopython at maubp.freeserve.co.uk
Thu Dec 15 08:33:37 EST 2005


Does anyone on the discussion list use GEO files?

Peter

-------- Original Message --------
Subject: [Biopython-dev] Bio.Geo for NCBI's GEO microarry SOFT files
Date: Sat, 10 Dec 2005 18:39:13 +0000
From: Peter <biopython-dev at maubp.freeserve.co.uk>
To: biopython-dev at biopython.org

I've just been looking at the Bio.Geo module by Katharine Lindner,
contributed back in 2002 which should parse the NCBI's Gene Expression
Omnibus (GEO) microarray data files.

http://www.ncbi.nlm.nih.gov/geo/

Is anyone using Bio.Geo at the moment?

The NCBI seem to call these SOFT files, (*.soft) and the format is
documented here:

http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html#SOFTformat

Apparently in 2005, they began a switch to a revised file format, new
format files here:

ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_gz/

Old format files here:

ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old/
ftp://ftp.ncbi.nih.gov/pub/geo/data/gds/soft_old_gz/

As far as I can tell, neither the "old" or "new" versions work in
Bio.Geo, so there may have been another format change between 2002 and 2005.

In addition the 2005 change introduces new lines, before and after the
actual data:

!dataset_table_begin
!dataset_table_end

These are definitely not supported in the current Martel grammar for GEO
files.

Peter

_______________________________________________
Biopython-dev mailing list
Biopython-dev at biopython.org
http://biopython.org/mailman/listinfo/biopython-dev





More information about the BioPython mailing list