<div dir="ltr"><div><div><div><div>Can't you find those information in the "source" feature? Check this list: </div>List l = sequence.getFeaturesByType("source");<br><br></div>This come from the fact that in new version of genbank file, source is a compulsory feature and they move many info from top level "Features tag" into "Source" tag qualifiers.<br><br></div>Let us know,<br></div>Paolo<br><div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-06-03 14:29 GMT+02:00 simon rayner <span dir="ltr"><<a href="mailto:simon.rayner.cn@gmail.com" target="_blank">simon.rayner.cn@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">Thanks to all for taking the time to answer. </div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">I had already got as far as parsing out the feature information using something like</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><div class="gmail_default"><span style="white-space:pre-wrap">        </span>LinkedHashMap<String, DNASequence> dnaSequences = GenbankReaderHelper.readGenbankDNASequence( dnaFile );</div><div class="gmail_default"><span style="white-space:pre-wrap">        </span>for (DNASequence sequence : dnaSequences.values()) {</div><div><br></div></div><div class="gmail_default"><div class="gmail_default" style="font-family:tahoma,sans-serif"> List<FeatureInterface<AbstractSequence<NucleotideCompound>, NucleotideCompound>> fl = sequence.getFeatures();</div><div class="gmail_default" style="font-family:tahoma,sans-serif"> for (FeatureInterface fi : fl) {</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div><div><font face="tahoma, sans-serif"> HashMap <String, Qualifier> quals = fi.getQualifiers();</font></div><div><font face="tahoma, sans-serif"> for(Map.Entry<String, Qualifier> entry : quals.entrySet()){</font></div><div><font face="tahoma, sans-serif"> <a href="http://logger.info" target="_blank">logger.info</a>("--\t" + entry.getKey() + "\t|\t" + entry.getValue().getName() </font></div><div><font face="tahoma, sans-serif"> + " / " + entry.getValue().getValue() + "\\" + entry.getValue().toString()); </font></div><div><font face="tahoma, sans-serif"> }</font></div></div><div><div style="font-family:tahoma,sans-serif"> <a href="http://logger.info" target="_blank">logger.info</a>("SHORT\t" + fi.getShortDescription());</div><div style="font-family:tahoma,sans-serif"> <a href="http://logger.info" target="_blank">logger.info</a>("SOURCE\t" + fi.getSource());</div><div style="font-family:tahoma,sans-serif"> <a href="http://logger.info" target="_blank">logger.info</a>("TYPE\t" + fi.getType());</div><div style="font-family:tahoma,sans-serif"> <a href="http://logger.info" target="_blank">logger.info</a>("HASHCODE\t" + fi.hashCode());</div><div style="font-family:tahoma,sans-serif"> <a href="http://logger.info" target="_blank">logger.info</a>("-");</div><div style="font-family:tahoma,sans-serif"> }</div><div style="font-family:tahoma,sans-serif"><br></div><div><div><font face="tahoma, sans-serif"><span style="white-space:pre-wrap">        </span>}</font></div></div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">But I am still stumped as to how to access the annotation information at the top of a GenBank file. </div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">For example, getAccession gets me the accession number of the sequence, but what about all the other data that is there (e.g. the pubmed records)?</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">In BJ3, there was a RichAnnotation class, but I don't see anything equivalent in BJ4.</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">cheers</div><span class="HOEnZb"><font color="#888888"><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif">Simon</div><div style="font-family:tahoma,sans-serif"><br></div><div style="font-family:tahoma,sans-serif"><br></div></font></span></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 3, 2015 at 12:39 PM, Paolo Pavan <span dir="ltr"><<a href="mailto:paolo.pavan@gmail.com" target="_blank">paolo.pavan@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Hi Simon, <br></div>I took care about last updates to the Genbank parser (reader). At the state of the art, there are two ways to read annotated Genbank files: <span lang="EN-US">via </span><span lang="EN-US"><span lang="EN-US">GenbankReader </span>and via </span><span lang="EN-US"><span lang="EN-US"><span lang="EN-US"></span>GenbankProxySequenceReader </span>. <br><br></span></div><span lang="EN-US">The first one:<br>GenbankReader<ProteinSequence, AminoAcidCompound> GenbankProtein<br> = new GenbankReader<ProteinSequence, AminoAcidCompound>(<br> inStream,<br> new GenericGenbankHeaderParser<ProteinSequence, AminoAcidCompound>(),<br> new ProteinSequenceCreator(AminoAcidCompoundSet.getAminoAcidCompoundSet())<br> );<br>LinkedHashMap<String, ProteinSequence> proteinSequences = GenbankProtein.process();<br> inStream.close();<br><br></span><br><span lang="EN-US"><span lang="EN-US">The second one is:<br><br></span>GenbankProxySequenceReader<AminoAcidCompound> genbankProteinReader<br> = new GenbankProxySequenceReader<AminoAcidCompound>("/my_directory", "NP_000257", AminoAcidCompoundSet.getAminoAcidCompoundSet());<br> ProteinSequence proteinSequence = new ProteinSequence(genbankProteinReader);<br><br><br></span></div><div><span lang="EN-US">Just keep in mind to use NucleotideCompound and a DNASequenceCreator(DNACompoundSet.getDNACompoundSet()) if you need to parse genbank nucleotide files.<br><br></span></div><div><span lang="EN-US">You can access annotation stored via getFeatures() methods family of the readed sequence object. Also note that features have qualifiers (those starting with / in the genbank file) and they must be accessed from the feature object with getQualifiers(). <br>Also note that feature can have complex locations (rare, but present) in this case you will find nested locations in the feature retrieved.<br><br></span></div><div><span lang="EN-US">Does this answer your question?<br></span></div><div><span lang="EN-US">Bye bye,<br>Paolo<br></span></div><div><span lang="EN-US"><br></span></div><div><span lang="EN-US"><br><br></span></div><span lang="EN-US"><br></span><div><div><span lang="EN-US"><br></span></div></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">2015-06-03 10:27 GMT+02:00 Jose Manuel Duarte <span dir="ltr"><<a href="mailto:jose.duarte@psi.ch" target="_blank">jose.duarte@psi.ch</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I can't offer much help regarding GenBank parsing itself, but I would at least like to clarify the situation with the different (indeed confusing) versions:<br>
<br>
BJ4 is the current release, well maintained and under development. BJ3 has been completely superseded by BJ4. That means that BJ4 does everything that BJ3 did. In the cookbook and tutorials everything that refers to BJ3 should work in BJ4, with the only difference that the namespace of packages has changed from org.biojava.bio/org.biojava3 to org.biojava.nbio.<br>
<br>
BJ1 and BJX are both legacy projects, with some maintenance but not much active development. I believe that some of the features in them were not ported to BJ3+.<br>
<br>
Cheers<span><font color="#888888"><br>
<br>
Jose</font></span><div><div><br>
<br>
<br>
On <a href="tel:02.06.2015%2011" value="+390206201511" target="_blank">02.06.2015 11</a>:40, Simon Rayner wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi<br>
<br>
I'm coming back to BioJava (BJ) after a couple of years away and am somewhat confused by the current collection of cookbooks, tutorials and APIs. There appear to be a few examples for handling protein structure data, but relatively little for more mainstream stuff such as parsing Genbank files, which I first need to get the information I want to investigate protein structure. But when I look at the relevant code samples to do this, they refer back to BJ3, BJ1, or even BJX. Even the Wiki page still refers to BJ3 despite the release of BJ4 back in Feb 2015.<br>
<br>
I have everything working for parsing GenBank data, but I'm still trying to get the Annotation information out of the top of a GenBank file, and can't find any way of doing this using BJ4 - the BJ4 API appears to refer to the RichAnnotation type in BJX release. Can anyone clarify what you are supposed to do here? Start mixing in some BJX? (and is BJX still active?) or should I still be using BJ3 until BJ4 stabilizes. I realise this is an open source project, but some clarification on the current status of things would be handy if the project is going to appeal to a larger community :)<br>
<br>
Thanks!<br>
<br>
<br>
<br>
_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
</blockquote>
<br>
_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br>
</div></div></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
Biojava-l mailing list - <a href="mailto:Biojava-l@mailman.open-bio.org" target="_blank">Biojava-l@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-l" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-l</a><br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>