[GSoC] GSOC project: improve SegAnnDB interactive DNA copy number analysis

Toby Hocking tdhock5 at gmail.com
Mon Mar 3 22:13:35 UTC 2014


Hey Eric thanks for your input -- these are really valuable comments!

It would be nice to integrate with Galaxy, but do you think the interactive
features would be possible? Can you recommend the relevant part of the
Galaxy manual to read?

I agree that it should be very nice to have a more robust genome viewer to
view other tracks such as RefSeq genes or SNP allele frequency data, but
again I wonder if the interactive annotation would be possible? By the way,
what is "paging GMOD?"

By the way, the reason why I decided to draw the signal using PNG
scatterplots is because it is very fast -- at first I tried using
JavaScript/D3 to draw it as SVG <circle> elements, but that is really slow
when there are a lot of points on screen (as in the genome-wide plots).

There is some SegAnnDB code for reading bedGraph files from disk before
they are loaded into BerkeleyDB --- do you think that reader is suitable
for inclusion in BioPython?

Finally, there are indeed some Python extension modules (PrunedDP and
SegAnnot) for efficiently calculating the displayed segmentation models, so
those could definitely be included in BioPython if you like. Their source
code is in SVN here
https://r-forge.r-project.org/scm/viewvc.php/python/?root=segannot


On Mon, Mar 3, 2014 at 4:44 PM, Eric Talevich <eric.talevich at gmail.com>wrote:

> Hi Toby & all,
>
> Since this idea is related to some very common types of bioinformatic
> analysis, I would encourage looking at opportunities to integrate this
> project with other software components that have already been developed,
> potentially even replacing part of the SegAnnDB codebase with a newly
> developed component:
>
> - The proposed idea for GenomeDiagram would make it interactive, and
> therefore very suitable for use in SegAnnDB. The current SegAnnDB generates
> a static PNG and then draws on this bitmap, while a Bokeh implementation
> could be smoother and more aesthetically pleasing. More importantly,
> integration with GenomeDiagram or a more robust genome viewer (paging
> GMOD?) would make it easier to display multiple tracks together, which
> seems to be a very important feature for this sort of manual annotation
> (e.g. CNVs are often correlated with loss of heterozygosity in SNPs, so
> plotting SNP allele frequencies on another track would aid annotation).
>
> - I could also envision this software as a Galaxy component. Would that
> work?
>
> - On the Biopython side, our modules for microarray analysis could benefit
> from some attention. The raw probe copy number or copy ratio values need to
> be extracted from a data source somehow before they can be used in
> SegAnnDB, right? The initial segmentation also needs to be calculated; it
> would also be useful to be able to do this in Python independently of
> SegAnnDB. Can you see an opportunity to write reusable code that will
> perform these operations in Biopython?
>
> Cheers,
> Eric
>
>
> On Mon, Mar 3, 2014 at 9:08 AM, Toby Hocking <tdhock5 at gmail.com> wrote:
>
>> Thanks for the input and the links to related work, Peter and Raoul.
>>
>> About BioPython, I have used it in previous projects, but its current
>> features do not really help for this SegAnnDB project. As I understand,
>> BioPython is best for things like sequence analysis and downloading data
>> from GenBank, but for SegAnnDB I was doing something quite different:
>> interactive visualization and storing user-specific annotations using a
>> web
>> server/database.
>>
>> About GenomeDiagram, it definitely could be used to plot DNA copy number
>> profiles, but it is currently neither interactive nor linked to a
>> database,
>> and I really needed both of those features for SegAnnDB. GenomeDiagram
>> could have been used to make some of the static PNG plots on SegAnnDB, but
>> instead I used PIL directly since that is faster.
>>
>> About R, thanks for the link to the Java implementation of fastR, but I
>> haven't used R at all in SegAnnDB since I wanted to just depend on 1
>> language on the server side (Python).
>>
>> Finally about BioJS, I also wrote them, but now I realize that SegAnnDB is
>> a better fit for the cross-language nature of OBF. BioJS is focused on
>> developing web client-side JavaScript visualizations, which SegAnnDB does
>> for DNA copy number profiles, so perhaps I could work with the BioJS guys
>> on porting my existing JS code for their uses. However, my SegAnnDB
>> project
>> is also tightly integrated with a server-side Python component, which I
>> would like a student to develop in GSOC.
>>
>> So again thanks for the encouraging comments and I will go ahead and post
>> a
>> more detailed project proposal on the wiki.
>>
>>
>> On Mon, Mar 3, 2014 at 10:47 AM, Peter Cock <p.j.a.cock at googlemail.com
>> >wrote:
>>
>> > On Mon, Mar 3, 2014 at 3:07 PM, Toby Hocking <tdhock5 at gmail.com> wrote:
>> > > Hey OBF developers, I am a bioinformatics researcher and long-time
>> > > developeR (admin and mentor for R's participation in GSOC). Using
>> > > JavaScript and Python, I have developed SegAnnDB, a web site for
>> > > visualization and interactive annotation of DNA copy number profiles
>> > >
>> >
>> http://bioinformatics.oxfordjournals.org/content/early/2014/02/03/bioinformatics.btu072.shortand
>> > > I want to get a GSOC student to implement some improvements. Would it
>> > > be possible for me to propose this as an OBF project and possibly be a
>> > > mentor for GSOC?
>> > >
>> > > I think SegAnnDB fits into the main theme of OBF: writing open-source
>> > code
>> > > for analysis and visualization of biological data. The student would
>> need
>> > > to write Python code for the server side and JavaScript code for the
>> web
>> > > client side so I think it would fit best into the "cross-project
>> ideas"
>> > > section.
>> > >
>> > > Anyway, if it is OK with you guys, can please I post my project
>> proposal
>> > to
>> > > the OBF GSOC ideas wiki page?
>> >
>> > Do you see any natural links to Biopython on the server side (an OBF
>> > project which would be good for the GSoC link) or BioJS on the client
>> > side (not an OBF project, but also participating under GSoC directly)?
>> >
>> > See also Leighton's outline Biopython proposal on interactive graphics:
>> >
>> >
>> http://biopython.org/wiki/Google_Summer_of_Code#Interactive_GenomeDiagram_Module
>> >
>> > Peter
>> >
>> _______________________________________________
>> GSoC mailing list
>> GSoC at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>
>
>



More information about the GSoC mailing list