[Bioperl-l] GEO SOFT Parser?
Gong Wuming
gongwuming at hotmail.com
Sun May 30 22:47:26 EDT 2004
Hi Tex.
I asked the same question here some days before but got no responce. It is
a bit surprising because I thought it should be relatively common problem.
At first I planned to roll a module for parsing soft format in
Bio::Expression::MicroarrayIO::, but then I found it is a difficult for me
because many important base classes in Bioperl-Microarray were not
implemented yet especially on the feature of expression data. So, I wrote a
simple perl script for reading information in soft file into a data
strucuture. below is the code.
-----------------------------------
#! /usr/bin/perl
use strict;
use warnings;
my $hash = {};
my $DATA = ();
my ($last_domain, $this_domain, $last_mark, $this_mark);
# Reading file line by line.
while (<>){
chomp;
$this_mark = substr($_, 0, 1); # Get line marker: '^', '!' or '#'
if ($this_mark =~ /\^|\!/){ # If the line is headed by '^' or '!'.
my @attr;
# Extract the key-value pair ("key = value")
my ($key, $value) = split (/\s+=\s+/, substr($_, 1));
($this_domain, @attr) = split ("_", $key);
my $attribute = join ('_', @attr) || 'id';
if ($this_mark eq '^' and $last_domain) {
my %attribute = %$hash;
push (@{$DATA->{$last_domain}}, \%attribute);
$hash = {};
}
$hash->{$attribute} = $value;
}elsif ($this_mark eq '#'){
my ($field, $desc) = /^#(.+?)\s+=\s+(.+)$/;
my ($description, $src) = (split (/;*\s+.+?:\s+/, $desc))[1, 2];
push (@{$DATA->{'data'}}, {'field'=>$field,
'description'=>$description, 'src'=>$src, 'value'=>[]});
}else{ # Data field.
next if /^ID_REF/;
my $i = 0;
map {push (@{$DATA->{'data'}->[$i++]->{'value'}}, $_)} split (/\t/);
}
$last_domain = $this_domain;
$last_mark = $this_mark;
}
-------------------------------------------------------------
The results were stored in such a data structrure:
$DATA{
'database'=>{
'name'=>
'institute'=>
'web_link'=>
'email'=>
'ref'=>
}
'dataset'=>{
'id'=>
'completeness'=>
'description'=>
'experiment_type'=>
'maximum_probes'=>
'order'=>
'organism'=>
'platform'=>
'reference_series'=>
'title'=>
'total_samples'=>
'update_date'=>
'value_type'=>
}
'subset'=>[
{
'id'=>
'description'=>
'type'=>
'sample=>[]
}
]
'data'=>[
{
field =>
description=>
src=>
value=>[]
}
]
}
Wuming Gong
--
College of Life Science,
Wuhan University, China.
_________________________________________________________________
Ãâ·ÑÏÂÔØ MSN Explorer: http://explorer.msn.com/lccn/
More information about the Bioperl-l
mailing list