[Biopython] affy CEL and CDF reader
Vincent Davis
vincent at vincentdavis.net
Thu Apr 8 19:43:57 UTC 2010
No I was not reading the binary files. That said I am interested in perusing
that if there is interest.
Do you have a link to the SDK?
*Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
On Thu, Apr 8, 2010 at 1:40 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Thu, Apr 8, 2010 at 3:03 PM, Vincent Davis <vincent at vincentdavis.net>
> wrote:
> > Parsing it myself, But based directly an the affy documentation found
> here.
> >
> http://www.stat.lsa.umich.edu/~kshedden/Courses/Stat545/Notes/AffxFileFormats/
>
> So, are you covering both binary and text formats for .CEL files? I
> think that modern .CEL files (those produced by GCOS) are binary and
> represent the majority of .CEL files produced today. Some of the I/O
> issues that you discuss are almost definitely dealt with by using the
> binary .CEL files.
>
> I'm certainly not an expert on Affy, so take all these
> questions/comments with a grain of salt.
>
> Sean
>
>
> > On Thu, Apr 8, 2010 at 12:56 PM, Sean Davis <sdavis2 at mail.nih.gov>
> wrote:
> >
> >> On Thu, Apr 8, 2010 at 2:33 PM, Vincent Davis <vincent at vincentdavis.net
> >
> >> wrote:
> >> > I ended up writing my own modules for reading both affy Cel and CDF
> >> files.
> >> > Long story as to why I did not just use what was available in
> biopython.
> >> > I plan on making what I have done available to the biopython and will
> >> upload
> >> > it as a fork. I will outline what ways what I have is different below.
> >> > My question is: Are there any improvements(features) others would like
> to
> >> > see beyond what is avalible in the current CelFile.py?
> >> > I saw some posts a month or so ago about checking for consistency in
> cell
> >> > file, I think it was something about making sure the stated number of
> >> probes
> >> > was consistent with the intensity measurements.
> >> >
> >> > What is different,
> >> > when an file is read Affycel.read('file') many atributes are set. for
> >> > example
> >> > a = affcel()
> >> > a.read('testfile')
> >> > a.filename,
> >> > a.version,
> >> > a.header.items() # a dictionary of all header items
> >> > a.num_intensity
> >> > a.intensity
> >> > a.num_masks
> >> > a.masks
> >> > a.num_outliers
> >> > a.outliers
> >> > a.numb_modified
> >> > a.modified
> >> >
> >> > I plan to add the ability return/call intensity values with our with
> >> > outliers or mask values.
> >> > All data is currently store in numpy structured arrays,
> >> > currently a.intensity returns the structured array, but I plan on
> making
> >> it
> >> > an option to easily choose how this is returned.
> >> > also what to make an optional normalized intensity array so that if
> the
> >> data
> >> > is normalized it can be stored with the affycel instance. My use case
> was
> >> > that I was opening about 80 cel files and reading them in was slow.
> this
> >> > allowed me to read each file as an instance of affycel stored in a
> list
> >> that
> >> > I then pickled. It was then much faster to open them.
> >> >
> >> > Are improvements to the CelFile.py are of value to biopython?
> >> >
> >> > I hope to have the code pushed up to my fork on github late tonight.
> Just
> >> > thought I would ask if there was any suggestion before I did.
> >> >
> >> > Also have an CDF file reader, but only have done some basic testing. I
> >> don't
> >> > have a lot of use for this, do other biopython users?
> >> >
> >> > I am kinda working in a vacuum and am trying to get more involved in
> >> > projects to improve my skills and knowledge. Any suggestions would be
> >> > appreciated.
> >>
> >> Just out of curiosity, is your work based on the affy sdk, or are you
> >> parsing stuff yourself?
> >>
> >> Sean
> >>
> > _______________________________________________
> > Biopython mailing list - Biopython at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython
> >
>
More information about the Biopython
mailing list