[Biojava-l] How do I read a FASTA file containing protein sequences in lowercase?

Richard Holland holland at eaglegenomics.com
Fri Nov 6 16:35:24 UTC 2009


Could you post the output from the exception stack that it generates?

thanks,
Richard

On 6 Nov 2009, at 16:25, Carl Mäsak wrote:

> I'm using RichSequenceIterator to read FASTA files containing
> proteins. Somehow it doesn't work when the protein sequences are in
> lowercase, which they sometimes are when downloaded from e.g. Uniprot.
> My code fails to recognize the following file as containing a protein
> sequence:
>
>> OPSD_FELCA
> mngtegpnfyvpfsnktgvvrspfeypqyylaepwqfsmlaaymfllivlgfpinfltlyvtvqhkklrtplnyilln
> lavadlfmvfggftttlytslhgyfvfgptgcnlegffatlggeialwslvvlaieryvvvckpmsnfrfgenhaimgv
> aftwvmalacaapplvgwsryipegmqcscgidyytlkpevnnesfviymfvvhftipmiviffcygqlvftvkeaaaq
> qqesattqkaekevtrmviimviaflicwvpyasvafyifthqgsnfgpifmtlpaffaksssiynpviyimmnkqfrn
> cmlttlccgknplgddeasttgsktetsqvapa
>
> What am I missing? Here's the code I'm using to read in sequences:
>
>    private List<ISequence> sequencesFromInputStream(InputStream  
> stream) {
>
>        BufferedInputStream bufferedStream = new BufferedInputStream 
> (stream);
>        Namespace ns = RichObjectFactory.getDefaultNamespace();
>        RichSequenceIterator seqit = null;
>
>        try {
>            seqit = RichSequence.IOTools.readStream(bufferedStream,  
> ns);
>        } catch (IOException e) {
>            logger.error("Couldn't read sequences from file", e);
>            return Collections.emptyList();
>        }
>
>        List<ISequence> sequences = new ArrayList<ISequence>();
>        try {
>            while ( seqit.hasNext() ) {
>                RichSequence rseq;
>                    rseq = seqit.nextRichSequence(); // *error occurs  
> here*
>                if (rseq == null)
>                    continue;
>                String alphabet = rseq.getAlphabet().getName();
>                sequences.add(
>                      "DNA".equals(alphabet) ? new BiojavaDNA(rseq)
>                    : "RNA".equals(alphabet) ? new BiojavaRNA(rseq)
>                    :                          new BiojavaProtein 
> (rseq) );
>            }
>        } catch (NoSuchElementException e) {
>            logger.error("Read past last sequence", e);
>        } catch (BioException e) {
>            logger.error(e); // *ends up here*
>        }
>
>        return sequences;
>    }
>
> Grateful for any pointers you might have.
>
> Regards,
> // Carl Mäsak
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the Biojava-l mailing list