[Biojava-dev] How to read a protein seq alignment file (FASTA)

sula rajapakse sulalith at gmail.com
Thu Jul 5 14:33:26 UTC 2012


Are there methods/libraries in bio java to read files from PDB
database and PISA DB
(http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html) ?

thx

SR

On Mon, Jul 2, 2012 at 7:47 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Thanks, Dan. This looks good to me and I committed this new constructor to
> SVN. If you want to send over also the rest of the code to build up the
> profile from an aligned fasta, I'll be happy to patch that too...
>
> Andreas
>
>
>
> On Mon, Jul 2, 2012 at 4:22 PM, Don Naki <dnaki1 at cox.net> wrote:
>
>> Hi Andreas, essentially, you are right. I *think* it's possible to create
>> a profile containing many sequences as long as the biojava API is used to
>> construct the profile. The issue is constructing a profile from previously
>> aligned sequences, i.e. using a pre-existing alignment file.
>>
>> It would be really nice if there was a reader class that allowed one to
>> read a protein Fasta alignment file and create a Profile directly from the
>> already aligned sequences.
>>
>> There doesn't appear to be such a reader (unless I've not found it).
>> However, there is a fasta reader that will read the aligned sequences in
>> the fasta alignment file and create ProteinSequence objects. OK, so I
>> figure now all I have to do is convert these ProteinSequence objects to
>> AlignedSequence objects and use the AlignedSequences to populate a Profile.
>> So I convert the ProteinSequence objects to String before manually creating
>> AlignedSequence objects, (inelegant, but there doesn't seem to be another
>> way unless I'm missing something). Now the problem is that there is no way
>> to construct a Profile from these aligned sequences if you have more than
>> two of them.
>>
>> Looking at the source code for SimpleProfile, there's no inherent
>> limitation on the number of aligned sequence members; it's just that there
>> are no constructors or mutators that accept a collection of
>> AlignedSequences.
>> I took a stab at such a constructor; it seems to work fine, but I haven't
>> tested it with biojava classes that interact with SimpleProfile. Any chance
>> someone could evaluate this and consider adding it to SimpleProfile?
>> Perhaps then that reader class would be the next step ;-)
>>
>> Many thanks,
>> Don
>>
>>         /**
>>          * Creates a profile for the already aligned sequences.
>>          * @param alignedSequences the already aligned sequences
>>          * @throws IllegalArgument if aligned sequences differ in length or
>>          * collection is empty.
>>          */
>>         public SimpleProfile(Collection<AlignedSequence<S,C>>
>> alignedSequences) {
>>             list = new ArrayList<AlignedSequence<S,C>>();
>>             originals = new ArrayList<S>();
>>
>>             Iterator<AlignedSequence<S,C>> itr =
>> alignedSequences.iterator();
>>             if(!itr.hasNext()) {
>>                 throw new IllegalArgumentException("alignedSequences must
>> not be empty");
>>             }
>>
>>             AlignedSequence<S, C> curAlignedSeq = itr.next();
>>             length = curAlignedSeq.getLength();
>>             list.add(curAlignedSeq);
>>             originals.add((S) curAlignedSeq.getOriginalSequence());
>>
>>             while (itr.hasNext()) {
>>                 curAlignedSeq = itr.next();
>>                 if (curAlignedSeq.getLength() != length) {
>>                     throw new IllegalArgumentException("Aligned sequences
>> differ in size");
>>                 }
>>                 list.add(curAlignedSeq);
>>                 originals.add((S) curAlignedSeq.getOriginalSequence());
>>             }
>>             list = Collections.unmodifiableList(list);
>>             originals = Collections.unmodifiableList(originals);
>>         }
>>
>> On Jul 2, 2012, at 1:17 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>
>> Is the problem that the SimpleProfile method makes it difficult to
>> re-create an instance with custom data, because there are no
>> set-methods?
>>
>> Andreas
>>
>>
>> On Mon, Jul 2, 2012 at 9:28 AM, Spencer Bliven <sbliven at ucsd.edu> wrote:
>>
>> Don–
>>
>>
>> I was trying to do this a while ago and got stuck in the same place. I
>>
>> assumed that someone intended to implement a multiple alignment Profile,
>>
>> but never got around to it. I didn't have the time to implement it properly
>>
>> so I ended up just working with lists of ProteinSequences. It's possible
>>
>> that this is implemented as a subclass of one of the multiple alignment
>>
>> algorithms or something. If not, this is definitely a hole in BioJava that
>>
>> should be filled.
>>
>>
>> -Spencer
>>
>>
>> On Fri, Jun 29, 2012 at 11:22 AM, <dnaki1 at cox.net> wrote:
>>
>>
>>
>> Hi,
>>
>> I would like to use biojava 3 to read a protein multiple sequence
>>
>> alignment file in FASTA format containing 5 sequences.
>>
>> Is this possible? It appears Profile<S,C> is the alignment interface, but
>>
>> I can't find an implementation that allows me to add more than 2 aligned
>>
>> sequences.
>>
>> Any help appreciated. Thanks
>>
>> Don Naki
>>
>> _______________________________________________
>>
>> biojava-dev mailing list
>>
>> biojava-dev at lists.open-bio.org
>>
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>>
>> _______________________________________________
>>
>> biojava-dev mailing list
>>
>> biojava-dev at lists.open-bio.org
>>
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev




More information about the biojava-dev mailing list