[Biojava-l] GenBank parsing

Paolo Pavan paolo.pavan at gmail.com
Wed Jun 3 10:39:15 UTC 2015


Hi Simon,
I took care about last updates to the Genbank parser (reader). At the state
of the art, there are two ways to read annotated Genbank files: via
GenbankReader
and via GenbankProxySequenceReader .

The first one:
GenbankReader<ProteinSequence, AminoAcidCompound> GenbankProtein
                = new GenbankReader<ProteinSequence, AminoAcidCompound>(
                        inStream,
                        new GenericGenbankHeaderParser<ProteinSequence,
AminoAcidCompound>(),
                        new
ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())
                );
LinkedHashMap<String, ProteinSequence> proteinSequences =
GenbankProtein.process();
        inStream.close();


The second one is:

GenbankProxySequenceReader<AminoAcidCompound> genbankProteinReader
                = new
GenbankProxySequenceReader<AminoAcidCompound>("/my_directory", "NP_000257",
AminoAcidCompoundSet.getAminoAcidCompoundSet());
        ProteinSequence proteinSequence = new
ProteinSequence(genbankProteinReader);


Just keep in mind to use NucleotideCompound and a
DNASequenceCreator(DNACompoundSet.getDNACompoundSet()) if you need to parse
genbank nucleotide files.

You can access annotation stored via getFeatures() methods family of the
readed sequence object. Also note that features have qualifiers (those
starting with / in the genbank file) and they must be accessed from the
feature object with getQualifiers().
Also note that feature can have complex locations (rare, but present) in
this case you will find nested locations in the feature retrieved.

Does this answer your question?
Bye bye,
Paolo






2015-06-03 10:27 GMT+02:00 Jose Manuel Duarte <jose.duarte at psi.ch>:

> I can't offer much help regarding GenBank parsing itself, but I would at
> least like to clarify the situation with the different (indeed confusing)
> versions:
>
> BJ4 is the current release, well maintained and under development. BJ3 has
> been completely superseded by BJ4. That means that BJ4 does everything that
> BJ3 did. In the cookbook and tutorials everything that refers to BJ3 should
> work in BJ4, with the only difference that the namespace of packages has
> changed from org.biojava.bio/org.biojava3 to org.biojava.nbio.
>
> BJ1 and BJX are both legacy projects, with some maintenance but not much
> active development. I believe that some of the features in them were not
> ported to BJ3+.
>
> Cheers
>
> Jose
>
>
>
> On 02.06.2015 11:40, Simon Rayner wrote:
>
>> Hi
>>
>> I'm coming back to BioJava (BJ) after a couple of years away and am
>> somewhat confused by the current collection of cookbooks, tutorials and
>> APIs. There appear to be a few examples for handling protein structure
>> data, but relatively little for more mainstream stuff such as parsing
>> Genbank files, which I first need to get the information I want to
>> investigate protein structure. But when I look at the relevant code samples
>> to do this, they refer back to BJ3, BJ1, or even BJX. Even the Wiki page
>> still refers to BJ3 despite the release of BJ4 back in Feb 2015.
>>
>> I have everything working for parsing GenBank data, but I'm still trying
>> to get the Annotation information out of the top of a GenBank file, and
>> can't find any way of doing this using BJ4 - the BJ4 API appears to refer
>> to the RichAnnotation type in BJX release. Can anyone clarify what you are
>> supposed to do here? Start mixing in some BJX? (and is BJX still active?)
>> or should I still be using BJ3 until BJ4 stabilizes. I realise this is an
>> open source project, but some clarification on the current status of things
>> would be handy if the project is going to appeal to a larger community :)
>>
>> Thanks!
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20150603/18bccf43/attachment.html>


More information about the Biojava-l mailing list