[Biojava-l] How do I read a FASTA file containing protein sequences in lowercase?

Carl Mäsak cmasak at gmail.com
Fri Nov 6 16:54:30 UTC 2009


Richard (>), Carl (>>):
>> I'm using RichSequenceIterator to read FASTA files containing
>> proteins. Somehow it doesn't work when the protein sequences are in
>> lowercase, which they sometimes are when downloaded from e.g. Uniprot.
>> My code fails to recognize the following file as containing a protein
>> sequence:
>>
>>> OPSD_FELCA
>>
>>
>> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
>>
>> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
>>
>> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
>>
>> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
>> cmlttlccgknplgddeasttgsktetsqvapa
>>
>> What am I missing? Here's the code I'm using to read in sequences:
>>
>>   private List<ISequence> sequencesFromInputStream(InputStream stream) {
>>
>>       BufferedInputStream bufferedStream = new
>> BufferedInputStream(stream);
>>       Namespace ns = RichObjectFactory.getDefaultNamespace();
>>       RichSequenceIterator seqit = null;
>>
>>       try {
>>           seqit = RichSequence.IOTools.readStream(bufferedStream, ns);
>>       } catch (IOException e) {
>>           logger.error("Couldn't read sequences from file", e);
>>           return Collections.emptyList();
>>       }
>>
>>       List<ISequence> sequences = new ArrayList<ISequence>();
>>       try {
>>           while ( seqit.hasNext() ) {
>>               RichSequence rseq;
>>                   rseq = seqit.nextRichSequence(); // *error occurs here*
>>               if (rseq == null)
>>                   continue;
>>               String alphabet = rseq.getAlphabet().getName();
>>               sequences.add(
>>                     "DNA".equals(alphabet) ? new BiojavaDNA(rseq)
>>                   : "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
>>                   :                          new BiojavaProtein(rseq) );
>>           }
>>       } catch (NoSuchElementException e) {
>>           logger.error("Read past last sequence", e);
>>       } catch (BioException e) {
>>           logger.error(e); // *ends up here*
>>       }
>>
>>       return sequences;
>>   }
>>
>> Grateful for any pointers you might have.
>
> Could you post the output from the exception stack that it generates?

org.biojava.bio.BioException: Could not read sequence
	at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
	at net.bioclipse.biojava.business.BiojavaManager.sequencesFromInputStream(BiojavaManager.java:314)
	at net.bioclipse.biojava.business.BiojavaManager.sequencesFromFile(BiojavaManager.java:291)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.doInvoke(AbstractManagerMethodDispatcher.java:243)
	at net.bioclipse.managers.business.JavaManagerMethodDispatcher.doInvokeInSameThread(JavaManagerMethodDispatcher.java:248)
	at net.bioclipse.managers.business.AbstractManagerMethodDispatcher.invoke(AbstractManagerMethodDispatcher.java:130)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at net.bioclipse.recording.WrapInProxyAdvice.invoke(WrapInProxyAdvice.java:22)
	at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.doInvoke(ServiceInvoker.java:59)
	at org.springframework.osgi.service.importer.internal.aop.ServiceInvoker.invoke(ServiceInvoker.java:67)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.osgi.service.importer.internal.aop.ServiceTCCLInterceptor.invoke(ServiceTCCLInterceptor.java:34)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.osgi.service.importer.support.LocalBundleContextAdvice.invoke(LocalBundleContextAdvice.java:59)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:131)
	at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:119)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at $Proxy18.invoke(Unknown Source)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:50)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
	at $Proxy20.sequencesFromFile(Unknown Source)
	at net.bioclipse.biojava.ui.editors.Aligner.setInput(Aligner.java:152)
	at net.bioclipse.biojava.ui.editors.Aligner.init(Aligner.java:138)
	at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:238)
	at org.eclipse.ui.part.MultiPageEditorPart.addPage(MultiPageEditorPart.java:212)
	at net.bioclipse.biojava.ui.editors.SequenceEditor.createPages(SequenceEditor.java:47)
	at org.eclipse.ui.part.MultiPageEditorPart.createPartControl(MultiPageEditorPart.java:357)
	at org.eclipse.ui.internal.EditorReference.createPartHelper(EditorReference.java:662)
	at org.eclipse.ui.internal.EditorReference.createPart(EditorReference.java:462)
	at org.eclipse.ui.internal.WorkbenchPartReference.getPart(WorkbenchPartReference.java:595)
	at org.eclipse.ui.internal.PartPane.setVisible(PartPane.java:313)
	at org.eclipse.ui.internal.presentations.PresentablePart.setVisible(PresentablePart.java:180)
	at org.eclipse.ui.internal.presentations.util.PresentablePartFolder.select(PresentablePartFolder.java:270)
	at org.eclipse.ui.internal.presentations.util.LeftToRightTabOrder.select(LeftToRightTabOrder.java:65)
	at org.eclipse.ui.internal.presentations.util.TabbedStackPresentation.selectPart(TabbedStackPresentation.java:473)
	at org.eclipse.ui.internal.PartStack.refreshPresentationSelection(PartStack.java:1256)
	at org.eclipse.ui.internal.PartStack.setSelection(PartStack.java:1209)
	at org.eclipse.ui.internal.PartStack.showPart(PartStack.java:1608)
	at org.eclipse.ui.internal.PartStack.add(PartStack.java:499)
	at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:103)
	at org.eclipse.ui.internal.PartStack.add(PartStack.java:485)
	at org.eclipse.ui.internal.EditorStack.add(EditorStack.java:112)
	at org.eclipse.ui.internal.EditorSashContainer.addEditor(EditorSashContainer.java:63)
	at org.eclipse.ui.internal.EditorAreaHelper.addToLayout(EditorAreaHelper.java:225)
	at org.eclipse.ui.internal.EditorAreaHelper.addEditor(EditorAreaHelper.java:213)
	at org.eclipse.ui.internal.EditorManager.createEditorTab(EditorManager.java:778)
	at org.eclipse.ui.internal.EditorManager.openEditorFromDescriptor(EditorManager.java:677)
	at org.eclipse.ui.internal.EditorManager.openEditor(EditorManager.java:638)
	at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditorBatched(WorkbenchPage.java:2854)
	at org.eclipse.ui.internal.WorkbenchPage.busyOpenEditor(WorkbenchPage.java:2762)
	at org.eclipse.ui.internal.WorkbenchPage.access$11(WorkbenchPage.java:2754)
	at org.eclipse.ui.internal.WorkbenchPage$10.run(WorkbenchPage.java:2705)
	at org.eclipse.swt.custom.BusyIndicator.showWhile(BusyIndicator.java:70)
	at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2701)
	at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2685)
	at org.eclipse.ui.internal.WorkbenchPage.openEditor(WorkbenchPage.java:2676)
	at org.eclipse.ui.ide.IDE.openEditor(IDE.java:651)
	at org.eclipse.ui.ide.IDE.openEditor(IDE.java:610)
	at org.eclipse.ui.actions.OpenFileAction.openFile(OpenFileAction.java:99)
	at org.eclipse.ui.actions.OpenSystemEditorAction.run(OpenSystemEditorAction.java:99)
	at org.eclipse.ui.actions.RetargetAction.run(RetargetAction.java:221)
	at org.eclipse.ui.navigator.CommonNavigatorManager$3.open(CommonNavigatorManager.java:202)
	at org.eclipse.ui.OpenAndLinkWithEditorHelper$InternalListener.open(OpenAndLinkWithEditorHelper.java:48)
	at org.eclipse.jface.viewers.StructuredViewer$2.run(StructuredViewer.java:842)
	at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
	at org.eclipse.core.runtime.Platform.run(Platform.java:888)
	at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:48)
	at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:175)
	at org.eclipse.jface.viewers.StructuredViewer.fireOpen(StructuredViewer.java:840)
	at org.eclipse.jface.viewers.StructuredViewer.handleOpen(StructuredViewer.java:1101)
	at org.eclipse.ui.navigator.CommonViewer.handleOpen(CommonViewer.java:467)
	at org.eclipse.jface.viewers.StructuredViewer$6.handleOpen(StructuredViewer.java:1205)
	at org.eclipse.jface.util.OpenStrategy.fireOpenEvent(OpenStrategy.java:264)
	at org.eclipse.jface.util.OpenStrategy.access$2(OpenStrategy.java:258)
	at org.eclipse.jface.util.OpenStrategy$1.handleEvent(OpenStrategy.java:298)
	at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
	at org.eclipse.swt.widgets.Display.sendEvent(Display.java:3543)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1250)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1273)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1258)
	at org.eclipse.swt.widgets.Widget.notifyListeners(Widget.java:1079)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Display.java:3441)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3100)
	at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2405)
	at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2369)
	at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2221)
	at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:500)
	at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:332)
	at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:493)
	at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
	at net.bioclipse.ui.Application.start(Application.java:36)
	at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:194)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
	at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:368)
	at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:559)
	at org.eclipse.equinox.launcher.Main.basicRun(Main.java:514)
	at org.eclipse.equinox.launcher.Main.run(Main.java:1311)
	at org.eclipse.equinox.launcher.Main.main(Main.java:1287)
Caused by: org.biojava.bio.seq.io.ParseException:

A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post
a bug report to http://bugzilla.open-bio.org/

Format_object=org.biojavax.bio.seq.io.FastaFormat
Accession=OPSD_FELCA
Id=null
Comments=problem parsing symbols
Parse_block=mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyillnlavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgvaftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaqqqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrncmlttlccgknplgddeasttgsktetsqvapa
Stack trace follows ....


	at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:244)
	at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
	... 114 more
Caused by: org.biojava.bio.symbol.IllegalSymbolException: This
tokenization doesn't contain character: 'e'
	at org.biojava.bio.seq.io.CharacterTokenization.parseTokenChar(CharacterTokenization.java:175)
	at org.biojava.bio.seq.io.CharacterTokenization$TPStreamParser.characters(CharacterTokenization.java:246)
	at org.biojava.bio.symbol.SimpleSymbolList.<init>(SimpleSymbolList.java:178)
	at org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:237)
	... 115 more

// Carl




More information about the Biojava-l mailing list