[Biojava-dev] How to read a protein seq alignment file (FASTA)

Andreas Prlic andreas at sdsc.edu
Mon Jul 2 23:47:08 UTC 2012


Thanks, Dan. This looks good to me and I committed this new constructor to
SVN. If you want to send over also the rest of the code to build up the
profile from an aligned fasta, I'll be happy to patch that too...

Andreas



On Mon, Jul 2, 2012 at 4:22 PM, Don Naki <dnaki1 at cox.net> wrote:

> Hi Andreas, essentially, you are right. I *think* it's possible to create
> a profile containing many sequences as long as the biojava API is used to
> construct the profile. The issue is constructing a profile from previously
> aligned sequences, i.e. using a pre-existing alignment file.
>
> It would be really nice if there was a reader class that allowed one to
> read a protein Fasta alignment file and create a Profile directly from the
> already aligned sequences.
>
> There doesn't appear to be such a reader (unless I've not found it).
> However, there is a fasta reader that will read the aligned sequences in
> the fasta alignment file and create ProteinSequence objects. OK, so I
> figure now all I have to do is convert these ProteinSequence objects to
> AlignedSequence objects and use the AlignedSequences to populate a Profile.
> So I convert the ProteinSequence objects to String before manually creating
> AlignedSequence objects, (inelegant, but there doesn't seem to be another
> way unless I'm missing something). Now the problem is that there is no way
> to construct a Profile from these aligned sequences if you have more than
> two of them.
>
> Looking at the source code for SimpleProfile, there's no inherent
> limitation on the number of aligned sequence members; it's just that there
> are no constructors or mutators that accept a collection of
> AlignedSequences.
> I took a stab at such a constructor; it seems to work fine, but I haven't
> tested it with biojava classes that interact with SimpleProfile. Any chance
> someone could evaluate this and consider adding it to SimpleProfile?
> Perhaps then that reader class would be the next step ;-)
>
> Many thanks,
> Don
>
>         /**
>          * Creates a profile for the already aligned sequences.
>          * @param alignedSequences the already aligned sequences
>          * @throws IllegalArgument if aligned sequences differ in length or
>          * collection is empty.
>          */
>         public SimpleProfile(Collection<AlignedSequence<S,C>>
> alignedSequences) {
>             list = new ArrayList<AlignedSequence<S,C>>();
>             originals = new ArrayList<S>();
>
>             Iterator<AlignedSequence<S,C>> itr =
> alignedSequences.iterator();
>             if(!itr.hasNext()) {
>                 throw new IllegalArgumentException("alignedSequences must
> not be empty");
>             }
>
>             AlignedSequence<S, C> curAlignedSeq = itr.next();
>             length = curAlignedSeq.getLength();
>             list.add(curAlignedSeq);
>             originals.add((S) curAlignedSeq.getOriginalSequence());
>
>             while (itr.hasNext()) {
>                 curAlignedSeq = itr.next();
>                 if (curAlignedSeq.getLength() != length) {
>                     throw new IllegalArgumentException("Aligned sequences
> differ in size");
>                 }
>                 list.add(curAlignedSeq);
>                 originals.add((S) curAlignedSeq.getOriginalSequence());
>             }
>             list = Collections.unmodifiableList(list);
>             originals = Collections.unmodifiableList(originals);
>         }
>
> On Jul 2, 2012, at 1:17 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
> Is the problem that the SimpleProfile method makes it difficult to
> re-create an instance with custom data, because there are no
> set-methods?
>
> Andreas
>
>
> On Mon, Jul 2, 2012 at 9:28 AM, Spencer Bliven <sbliven at ucsd.edu> wrote:
>
> Don–
>
>
> I was trying to do this a while ago and got stuck in the same place. I
>
> assumed that someone intended to implement a multiple alignment Profile,
>
> but never got around to it. I didn't have the time to implement it properly
>
> so I ended up just working with lists of ProteinSequences. It's possible
>
> that this is implemented as a subclass of one of the multiple alignment
>
> algorithms or something. If not, this is definitely a hole in BioJava that
>
> should be filled.
>
>
> -Spencer
>
>
> On Fri, Jun 29, 2012 at 11:22 AM, <dnaki1 at cox.net> wrote:
>
>
>
> Hi,
>
> I would like to use biojava 3 to read a protein multiple sequence
>
> alignment file in FASTA format containing 5 sequences.
>
> Is this possible? It appears Profile<S,C> is the alignment interface, but
>
> I can't find an implementation that allows me to add more than 2 aligned
>
> sequences.
>
> Any help appreciated. Thanks
>
> Don Naki
>
> _______________________________________________
>
> biojava-dev mailing list
>
> biojava-dev at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
> _______________________________________________
>
> biojava-dev mailing list
>
> biojava-dev at lists.open-bio.org
>
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>




More information about the biojava-dev mailing list