[Biojava-l] How do I read a FASTA file containing protein sequences in lowercase?
Richard Holland
holland at eaglegenomics.com
Fri Nov 6 16:35:24 UTC 2009
Could you post the output from the exception stack that it generates?
thanks,
Richard
On 6 Nov 2009, at 16:25, Carl Mäsak wrote:
> I'm using RichSequenceIterator to read FASTA files containing
> proteins. Somehow it doesn't work when the protein sequences are in
> lowercase, which they sometimes are when downloaded from e.g. Uniprot.
> My code fails to recognize the following file as containing a protein
> sequence:
>
>> OPSD_FELCA
> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
> cmlttlccgknplgddeasttgsktetsqvapa
>
> What am I missing? Here's the code I'm using to read in sequences:
>
> private List<ISequence> sequencesFromInputStream(InputStream
> stream) {
>
> BufferedInputStream bufferedStream = new BufferedInputStream
> (stream);
> Namespace ns = RichObjectFactory.getDefaultNamespace();
> RichSequenceIterator seqit = null;
>
> try {
> seqit = RichSequence.IOTools.readStream(bufferedStream,
> ns);
> } catch (IOException e) {
> logger.error("Couldn't read sequences from file", e);
> return Collections.emptyList();
> }
>
> List<ISequence> sequences = new ArrayList<ISequence>();
> try {
> while ( seqit.hasNext() ) {
> RichSequence rseq;
> rseq = seqit.nextRichSequence(); // *error occurs
> here*
> if (rseq == null)
> continue;
> String alphabet = rseq.getAlphabet().getName();
> sequences.add(
> "DNA".equals(alphabet) ? new BiojavaDNA(rseq)
> : "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
> : new BiojavaProtein
> (rseq) );
> }
> } catch (NoSuchElementException e) {
> logger.error("Read past last sequence", e);
> } catch (BioException e) {
> logger.error(e); // *ends up here*
> }
>
> return sequences;
> }
>
> Grateful for any pointers you might have.
>
> Regards,
> // Carl Mäsak
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the Biojava-l
mailing list