[Biojava-dev] How to read a protein seq alignment file (FASTA)

sula rajapakse sulalith at gmail.com
Fri Jul 6 14:53:01 UTC 2012


Hi Andreas:

I think that is what I need. May be in the process contribute
something to BioJava. Can you point me to the libraries oj Biojava for
3d Protein stucture surface area calculation function (something
similar to pisa).

thx

Sula

On Thu, Jul 5, 2012 at 11:35 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> Hi Sula,
>
>>
>> Are there methods/libraries in bio java to read files from PDB
>> database and PISA DB
>> (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html) ?
>
>
>
> There are a lot of 3D protein structure related features in BioJava and PDB
> parsing has been around for quite some time.
>
> http://biojava.org/wiki/BioJava:CookBook#Protein_Structure
>
> Regarding Pisa, it depends what you need. If your goal is to re-create the
> biological assembly, BioJava can help. However it does not use the original
> PISA files, but whatever is archived in the PDB/mmCif files. In fact I am
> just working on the support for biological assemblies and it will be
> announced shortly. If you need more low-level access to PISA files, there is
> currently no parser for this, however it would be interesting to add and we
> would accept patches for that.
>
> Andreas
>
>
>
>>
>>
>> thx
>>
>> SR
>>
>> On Mon, Jul 2, 2012 at 7:47 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> > Thanks, Dan. This looks good to me and I committed this new constructor
>> > to
>> > SVN. If you want to send over also the rest of the code to build up the
>> > profile from an aligned fasta, I'll be happy to patch that too...
>> >
>> > Andreas
>> >
>> >
>> >
>> > On Mon, Jul 2, 2012 at 4:22 PM, Don Naki <dnaki1 at cox.net> wrote:
>> >
>> >> Hi Andreas, essentially, you are right. I *think* it's possible to
>> >> create
>> >> a profile containing many sequences as long as the biojava API is used
>> >> to
>> >> construct the profile. The issue is constructing a profile from
>> >> previously
>> >> aligned sequences, i.e. using a pre-existing alignment file.
>> >>
>> >> It would be really nice if there was a reader class that allowed one to
>> >> read a protein Fasta alignment file and create a Profile directly from
>> >> the
>> >> already aligned sequences.
>> >>
>> >> There doesn't appear to be such a reader (unless I've not found it).
>> >> However, there is a fasta reader that will read the aligned sequences
>> >> in
>> >> the fasta alignment file and create ProteinSequence objects. OK, so I
>> >> figure now all I have to do is convert these ProteinSequence objects to
>> >> AlignedSequence objects and use the AlignedSequences to populate a
>> >> Profile.
>> >> So I convert the ProteinSequence objects to String before manually
>> >> creating
>> >> AlignedSequence objects, (inelegant, but there doesn't seem to be
>> >> another
>> >> way unless I'm missing something). Now the problem is that there is no
>> >> way
>> >> to construct a Profile from these aligned sequences if you have more
>> >> than
>> >> two of them.
>> >>
>> >> Looking at the source code for SimpleProfile, there's no inherent
>> >> limitation on the number of aligned sequence members; it's just that
>> >> there
>> >> are no constructors or mutators that accept a collection of
>> >> AlignedSequences.
>> >> I took a stab at such a constructor; it seems to work fine, but I
>> >> haven't
>> >> tested it with biojava classes that interact with SimpleProfile. Any
>> >> chance
>> >> someone could evaluate this and consider adding it to SimpleProfile?
>> >> Perhaps then that reader class would be the next step ;-)
>> >>
>> >> Many thanks,
>> >> Don
>> >>
>> >>         /**
>> >>          * Creates a profile for the already aligned sequences.
>> >>          * @param alignedSequences the already aligned sequences
>> >>          * @throws IllegalArgument if aligned sequences differ in
>> >> length or
>> >>          * collection is empty.
>> >>          */
>> >>         public SimpleProfile(Collection<AlignedSequence<S,C>>
>> >> alignedSequences) {
>> >>             list = new ArrayList<AlignedSequence<S,C>>();
>> >>             originals = new ArrayList<S>();
>> >>
>> >>             Iterator<AlignedSequence<S,C>> itr =
>> >> alignedSequences.iterator();
>> >>             if(!itr.hasNext()) {
>> >>                 throw new IllegalArgumentException("alignedSequences
>> >> must
>> >> not be empty");
>> >>             }
>> >>
>> >>             AlignedSequence<S, C> curAlignedSeq = itr.next();
>> >>             length = curAlignedSeq.getLength();
>> >>             list.add(curAlignedSeq);
>> >>             originals.add((S) curAlignedSeq.getOriginalSequence());
>> >>
>> >>             while (itr.hasNext()) {
>> >>                 curAlignedSeq = itr.next();
>> >>                 if (curAlignedSeq.getLength() != length) {
>> >>                     throw new IllegalArgumentException("Aligned
>> >> sequences
>> >> differ in size");
>> >>                 }
>> >>                 list.add(curAlignedSeq);
>> >>                 originals.add((S) curAlignedSeq.getOriginalSequence());
>> >>             }
>> >>             list = Collections.unmodifiableList(list);
>> >>             originals = Collections.unmodifiableList(originals);
>> >>         }
>> >>
>> >> On Jul 2, 2012, at 1:17 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> >>
>> >> Is the problem that the SimpleProfile method makes it difficult to
>> >> re-create an instance with custom data, because there are no
>> >> set-methods?
>> >>
>> >> Andreas
>> >>
>> >>
>> >> On Mon, Jul 2, 2012 at 9:28 AM, Spencer Bliven <sbliven at ucsd.edu>
>> >> wrote:
>> >>
>> >> Don–
>> >>
>> >>
>> >> I was trying to do this a while ago and got stuck in the same place. I
>> >>
>> >> assumed that someone intended to implement a multiple alignment
>> >> Profile,
>> >>
>> >> but never got around to it. I didn't have the time to implement it
>> >> properly
>> >>
>> >> so I ended up just working with lists of ProteinSequences. It's
>> >> possible
>> >>
>> >> that this is implemented as a subclass of one of the multiple alignment
>> >>
>> >> algorithms or something. If not, this is definitely a hole in BioJava
>> >> that
>> >>
>> >> should be filled.
>> >>
>> >>
>> >> -Spencer
>> >>
>> >>
>> >> On Fri, Jun 29, 2012 at 11:22 AM, <dnaki1 at cox.net> wrote:
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I would like to use biojava 3 to read a protein multiple sequence
>> >>
>> >> alignment file in FASTA format containing 5 sequences.
>> >>
>> >> Is this possible? It appears Profile<S,C> is the alignment interface,
>> >> but
>> >>
>> >> I can't find an implementation that allows me to add more than 2
>> >> aligned
>> >>
>> >> sequences.
>> >>
>> >> Any help appreciated. Thanks
>> >>
>> >> Don Naki
>> >>
>> >> _______________________________________________
>> >>
>> >> biojava-dev mailing list
>> >>
>> >> biojava-dev at lists.open-bio.org
>> >>
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >>
>> >> biojava-dev mailing list
>> >>
>> >> biojava-dev at lists.open-bio.org
>> >>
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >>
>> >>
>> >
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>
>
>




More information about the biojava-dev mailing list