[Biojava-dev] How to read a protein seq alignment file (FASTA)

Andreas Prlic andreas at sdsc.edu
Sat Jul 7 16:48:15 UTC 2012


Hi Sula,

Surface area calculations are currently not supported in BioJava...

Andreas


On Fri, Jul 6, 2012 at 7:53 AM, sula rajapakse <sulalith at gmail.com> wrote:

> Hi Andreas:
>
> I think that is what I need. May be in the process contribute
> something to BioJava. Can you point me to the libraries oj Biojava for
> 3d Protein stucture surface area calculation function (something
> similar to pisa).
>
> thx
>
> Sula
>
> On Thu, Jul 5, 2012 at 11:35 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
> > Hi Sula,
> >
> >>
> >> Are there methods/libraries in bio java to read files from PDB
> >> database and PISA DB
> >> (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html) ?
> >
> >
> >
> > There are a lot of 3D protein structure related features in BioJava and
> PDB
> > parsing has been around for quite some time.
> >
> > http://biojava.org/wiki/BioJava:CookBook#Protein_Structure
> >
> > Regarding Pisa, it depends what you need. If your goal is to re-create
> the
> > biological assembly, BioJava can help. However it does not use the
> original
> > PISA files, but whatever is archived in the PDB/mmCif files. In fact I am
> > just working on the support for biological assemblies and it will be
> > announced shortly. If you need more low-level access to PISA files,
> there is
> > currently no parser for this, however it would be interesting to add and
> we
> > would accept patches for that.
> >
> > Andreas
> >
> >
> >
> >>
> >>
> >> thx
> >>
> >> SR
> >>
> >> On Mon, Jul 2, 2012 at 7:47 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
> >> > Thanks, Dan. This looks good to me and I committed this new
> constructor
> >> > to
> >> > SVN. If you want to send over also the rest of the code to build up
> the
> >> > profile from an aligned fasta, I'll be happy to patch that too...
> >> >
> >> > Andreas
> >> >
> >> >
> >> >
> >> > On Mon, Jul 2, 2012 at 4:22 PM, Don Naki <dnaki1 at cox.net> wrote:
> >> >
> >> >> Hi Andreas, essentially, you are right. I *think* it's possible to
> >> >> create
> >> >> a profile containing many sequences as long as the biojava API is
> used
> >> >> to
> >> >> construct the profile. The issue is constructing a profile from
> >> >> previously
> >> >> aligned sequences, i.e. using a pre-existing alignment file.
> >> >>
> >> >> It would be really nice if there was a reader class that allowed one
> to
> >> >> read a protein Fasta alignment file and create a Profile directly
> from
> >> >> the
> >> >> already aligned sequences.
> >> >>
> >> >> There doesn't appear to be such a reader (unless I've not found it).
> >> >> However, there is a fasta reader that will read the aligned sequences
> >> >> in
> >> >> the fasta alignment file and create ProteinSequence objects. OK, so I
> >> >> figure now all I have to do is convert these ProteinSequence objects
> to
> >> >> AlignedSequence objects and use the AlignedSequences to populate a
> >> >> Profile.
> >> >> So I convert the ProteinSequence objects to String before manually
> >> >> creating
> >> >> AlignedSequence objects, (inelegant, but there doesn't seem to be
> >> >> another
> >> >> way unless I'm missing something). Now the problem is that there is
> no
> >> >> way
> >> >> to construct a Profile from these aligned sequences if you have more
> >> >> than
> >> >> two of them.
> >> >>
> >> >> Looking at the source code for SimpleProfile, there's no inherent
> >> >> limitation on the number of aligned sequence members; it's just that
> >> >> there
> >> >> are no constructors or mutators that accept a collection of
> >> >> AlignedSequences.
> >> >> I took a stab at such a constructor; it seems to work fine, but I
> >> >> haven't
> >> >> tested it with biojava classes that interact with SimpleProfile. Any
> >> >> chance
> >> >> someone could evaluate this and consider adding it to SimpleProfile?
> >> >> Perhaps then that reader class would be the next step ;-)
> >> >>
> >> >> Many thanks,
> >> >> Don
> >> >>
> >> >>         /**
> >> >>          * Creates a profile for the already aligned sequences.
> >> >>          * @param alignedSequences the already aligned sequences
> >> >>          * @throws IllegalArgument if aligned sequences differ in
> >> >> length or
> >> >>          * collection is empty.
> >> >>          */
> >> >>         public SimpleProfile(Collection<AlignedSequence<S,C>>
> >> >> alignedSequences) {
> >> >>             list = new ArrayList<AlignedSequence<S,C>>();
> >> >>             originals = new ArrayList<S>();
> >> >>
> >> >>             Iterator<AlignedSequence<S,C>> itr =
> >> >> alignedSequences.iterator();
> >> >>             if(!itr.hasNext()) {
> >> >>                 throw new IllegalArgumentException("alignedSequences
> >> >> must
> >> >> not be empty");
> >> >>             }
> >> >>
> >> >>             AlignedSequence<S, C> curAlignedSeq = itr.next();
> >> >>             length = curAlignedSeq.getLength();
> >> >>             list.add(curAlignedSeq);
> >> >>             originals.add((S) curAlignedSeq.getOriginalSequence());
> >> >>
> >> >>             while (itr.hasNext()) {
> >> >>                 curAlignedSeq = itr.next();
> >> >>                 if (curAlignedSeq.getLength() != length) {
> >> >>                     throw new IllegalArgumentException("Aligned
> >> >> sequences
> >> >> differ in size");
> >> >>                 }
> >> >>                 list.add(curAlignedSeq);
> >> >>                 originals.add((S)
> curAlignedSeq.getOriginalSequence());
> >> >>             }
> >> >>             list = Collections.unmodifiableList(list);
> >> >>             originals = Collections.unmodifiableList(originals);
> >> >>         }
> >> >>
> >> >> On Jul 2, 2012, at 1:17 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
> >> >>
> >> >> Is the problem that the SimpleProfile method makes it difficult to
> >> >> re-create an instance with custom data, because there are no
> >> >> set-methods?
> >> >>
> >> >> Andreas
> >> >>
> >> >>
> >> >> On Mon, Jul 2, 2012 at 9:28 AM, Spencer Bliven <sbliven at ucsd.edu>
> >> >> wrote:
> >> >>
> >> >> Don–
> >> >>
> >> >>
> >> >> I was trying to do this a while ago and got stuck in the same place.
> I
> >> >>
> >> >> assumed that someone intended to implement a multiple alignment
> >> >> Profile,
> >> >>
> >> >> but never got around to it. I didn't have the time to implement it
> >> >> properly
> >> >>
> >> >> so I ended up just working with lists of ProteinSequences. It's
> >> >> possible
> >> >>
> >> >> that this is implemented as a subclass of one of the multiple
> alignment
> >> >>
> >> >> algorithms or something. If not, this is definitely a hole in BioJava
> >> >> that
> >> >>
> >> >> should be filled.
> >> >>
> >> >>
> >> >> -Spencer
> >> >>
> >> >>
> >> >> On Fri, Jun 29, 2012 at 11:22 AM, <dnaki1 at cox.net> wrote:
> >> >>
> >> >>
> >> >>
> >> >> Hi,
> >> >>
> >> >> I would like to use biojava 3 to read a protein multiple sequence
> >> >>
> >> >> alignment file in FASTA format containing 5 sequences.
> >> >>
> >> >> Is this possible? It appears Profile<S,C> is the alignment interface,
> >> >> but
> >> >>
> >> >> I can't find an implementation that allows me to add more than 2
> >> >> aligned
> >> >>
> >> >> sequences.
> >> >>
> >> >> Any help appreciated. Thanks
> >> >>
> >> >> Don Naki
> >> >>
> >> >> _______________________________________________
> >> >>
> >> >> biojava-dev mailing list
> >> >>
> >> >> biojava-dev at lists.open-bio.org
> >> >>
> >> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >> >>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >>
> >> >> biojava-dev mailing list
> >> >>
> >> >> biojava-dev at lists.open-bio.org
> >> >>
> >> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >> >>
> >> >>
> >> >
> >> > _______________________________________________
> >> > biojava-dev mailing list
> >> > biojava-dev at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> >
> >
> >
>




More information about the biojava-dev mailing list