[Biojava-l] Tr: Retrieve Information from GenBank file

jc.lucky jc.lucky at laposte.net
Wed Oct 27 13:03:55 UTC 2010


I'm more interesting in the features (regqrding protein-ID, taxon, xref, product) and retrieving information about articles (authors, title). I don't look at all to the sequence data.
My purpose is to be able to read the GenBank file to retrieve those information so that I can proceed a conversion to a semantic rdf format file. I'm working on a specific gene at the moment but it would be interesting to extend to any GenBank file in the future.

Thanks,

Jean-Charles



> Message du 27/10/10 12:41
> De : "Scooter Willis" 
> A : "jc.lucky" 
> Copie à : "biojava-l lists open-bio org" 
> Objet : Re: [Biojava-l] Tr: Retrieve Information from GenBank file
>
> Jean-Charles
> 
> I have it on my list to do a GenBank parser but haven't had the time. I
> can't promise anything in the next couple weeks. Can you send some details
> about what a typical use case is for your purpose? Are you trying to get the
> sequence data or are you more interested in the features?
> 
> Thanks
> 
> Scooter
> 
> On Wed, Oct 27, 2010 at 4:11 AM, jc.lucky  wrote:
> 
> >
> > I tried once again with the new version of BioJava but without succeding.
> > Any idea or suggestion?
> >
> > Thanks in advance
> > Regards,
> >
> > Jean-Charles Ferrières
> >
> >
> > > Message du 22/10/10 10:11
> > > De : "jc.lucky"
> > > A : biojava-l at lists.open-bio.org
> > > Copie à :
> > > Objet : [Biojava-l] Retrieve Information from GenBank file
> > >
> > >
> > > Hi
> > >
> > > I'm trying to convert a GenBank file into a rdf file. The gene of
> > interest can be found a t : http://www.ncbi.nlm.nih.gov/protein/284794945
> > >
> > > With the below code I can read the GenBank file and I manage to retrieve
> > information and convert them in a rdf format. However I don't succeed in
> > retrieving some information such as Title, protein or product. According to
> > this page (http://www.biojava.org/wiki/BioJava:BioJavaXDocs#GenBan)it is
> > possible to do so.
> > > Please help me find what I do wrong or what should be done to achieve my
> > goal.
> > >
> > > //read the GeneBank File
> > > public static RichSequenceIterator readFile(String input,
> > > RichSequenceBuilderFactory seqFactory,
> > > Namespace ns)
> > > throws IOException, NoSuchElementException, BioException
> > > {
> > > ns = null;
> > > InputStream stream = new FileInputStream(input);
> > > BufferedReader rdfFile = new BufferedReader(new
> > InputStreamReader(stream));
> > > RichSequenceIterator seqs =
> > RichSequence.IOTools.readGenbankDNA(rdfFile,ns);
> > > return seqs;
> > > }
> > >
> > > //Retrieve information and convert them in rdf format
> > > public void writeToRDFFile(RichSequenceIterator rsi, String output)
> > > throws IOException, NoSuchElementException, BioException {
> > > //create model for the ontology
> > > OntModel model = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM,
> > null);
> > > OntClass parents;
> > > String URI = "http://pbr.wur.nl/#";
> > >
> > > while(rsi.hasNext())
> > > {
> > > RichSequence seq = rsi.nextRichSequence();
> > > String id = seq.getName();
> > > parents = model.createClass(URI + id);
> > > Set author = seq.getRankedDocRefs();//code to clean up Set&convert
> > toString
> > > String definition = seq.getDescription(); //code to clean up String
> > > //Add to model
> > > parents.addProperty(DC.description, definition);
> > > parents.addProperty(DC.publisher, authors);
> > > parents.addComment(taxonomy, "EN");
> > > parents.addProperty(DC.type, organism);
> > > //print in rdf format
> > > model.write(out, "RDF/XML");
> > > out.close(); }
> > > }
> > >
> > >
> > > Thanks,
> > > Jean-Charles Ferrières
> > _____________________________________________
> > > Biojava-l mailing list - Biojava-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/biojava-l

Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
Je crée ma boîte mail www.laposte.net





More information about the Biojava-l mailing list