[Biojava-dev] [Biojava-l] FASTA Header Parser

Scooter Willis HWillis at scripps.edu
Sat Feb 25 11:55:40 UTC 2012


Hannes

You can currently add arbitrary features to a sequence based on position
which should allow you to store quality information. You could create a
quality feature that goes from the start to finish of the sequence and
then in the feature retain an array for the quality scores.

Scooter

On 2/25/12 12:49 AM, "Hannes Brandstätter-Müller" <biojava at hannes.oib.com>
wrote:

>Hi!
>
>I just looked over the code. Just to verify my understanding: You do
>not parse the quality scores in any way (except checking if all chars
>are in the correct range)? I would need direct access to the concrete
>score values of each position for my project, so I am thinking about
>
>1) enhancing the DNASequence (and the Compound classes, to be more
>exact) to be able to hold quality information
>2) enhancing the fastq reader and writer to be able to deal with
>(input/output) the 3.0 DNASequence classes
>3) implement a reader for FASTA/QUAL format too
>
>As a side effect, you could use the route via the DNASequence to
>translate from Illumina to Sanger to Solexa format (which, as far as I
>understand it, is not supported yet)
>
>Hannes
>
>On Fri, Jan 13, 2012 at 06:30, Michael Heuer <heuermh at gmail.com> wrote:
>> Thanks, Scooter.
>>
>> I committed to a new module biojava3-sequencing since I wasn't sure
>> where in biojava3-genome the new package should go.  I saw an io
>> package with feature readers, but fastq is more a sequencing format.
>> Feel free to move it around if you can think of a better place for it.
>>
>> I'll need a day or two to become more familiar with the biojava3 core
>> before I can add the static helper method.  I'm still a 1.x guy I
>> guess.
>>
>>   michael
>>
>>
>> On Thu, Jan 12, 2012 at 11:21 AM, Scooter Willis <HWillis at scripps.edu>
>>wrote:
>>> I think Git is a read only copy.
>>>
>>> Can you do a Static Helper method in the same way we have
>>> FastaReaderHelper in the core. This way we can hide the implementation
>>> details. You can also map the QC attributes to the sequence as meta
>>>data
>>> and set the original header. This way we can write back exact format as
>>> written. If you get the code into genomics I can also add some
>>>additional
>>> features.
>>>
>>>
>>>
>>> On 1/12/12 12:15 PM, "Michael Heuer" <heuermh at gmail.com> wrote:
>>>
>>>>Thanks, will do this evening, since I don't have ssh access at work
>>>>for svn+ssh.  Or is it possible to commit to the git repository?
>>>>
>>>>  michael
>>>>
>>>>
>>>>On Thu, Jan 12, 2012 at 11:11 AM, Scooter Willis <HWillis at scripps.edu>
>>>>wrote:
>>>>> Michael
>>>>>
>>>>> You can put the source in the genomics module. At some point we can
>>>>> probably use the same code in core for a sequence proxy loader
>>>>>option so
>>>>> that you can load huge fastq files with a lazy loading of the
>>>>>sequences.
>>>>> This way you don't burn through memory.
>>>>>
>>>>> Scooter
>>>>>
>>>>> On 1/12/12 12:05 PM, "Michael Heuer" <heuermh at gmail.com> wrote:
>>>>>
>>>>>>Hannes Brandstätter-Müller wrote:
>>>>>>> On Wed, Jan 11, 2012 at 22:34, Michael Heuer <heuermh at gmail.com>
>>>>>>>wrote:
>>>>>>>> Hannes Brandstätter-Müller wrote:
>>>>>>>>> On Wed, Jan 11, 2012 at 16:24, Scooter Willis
>>>>>>>>><HWillis at scripps.edu>
>>>>>>>>>wrote:
>>>>>>>>>> Is this a custom header or something output from a sequencing
>>>>>>>>>> instrument/software?
>>>>>>>>>
>>>>>>>>> It's the output of the Roche/454 Titanium FLX Sequencer
>>>>>>>>
>>>>>>>> If you would rather, biojava has support for the FASTQ file
>>>>>>>>format,
>>>>>>>
>>>>>>> we already had this discussion this or last month. FASTQ support is
>>>>>>> still not in 3.0, only in 1.8. I will work on porting that to 3.0
>>>>>>>and
>>>>>>> supporting FASTA/QUAL too, most likely, but I will have time for
>>>>>>>that
>>>>>>> starting in April, after this project that I'm working on
>>>>>>>currently is
>>>>>>> finished.
>>>>>>
>>>>>>There is no reason to port, the fastq package in biojava-legacy is
>>>>>>completely self-contained.  If having it in a separate jar is really
>>>>>>a
>>>>>>problem, I could copy it over to the trunk for the next release of
>>>>>>biojava3.
>>>>>>
>>>>>>   michael
>>>>>>
>>>>>>_______________________________________________
>>>>>>biojava-dev mailing list
>>>>>>biojava-dev at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>
>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list