[Bioperl-l] BLAST to FeaturePair
hilmar.lapp@pharma.Novartis.com
hilmar.lapp@pharma.Novartis.com
Mon, 31 Jul 2000 18:19:20 +0100
>
> This could be done client side: Keep BPLite just "representing BLAST"
> without too much magic. But stripping '>' seems sane.
i can do that. what should be stored in $feature->seqname ?
(but as there is no full sequence object, i can't store ids and accs,
right?)
Generally, to my understanding the name of a sequence should be at
least somewhat unique, like an indentifier. In a BLAST report,
alignment sections start with '>', followed by the database name of
the hit (no spaces), whitespace, and the accession (usually), again
whitespace and the description. So, I'd store as seqname only what
matches />(\S+)/. Of course, the caller can do this as well.
> > b) the
> > lengths of the sequences are not stored (would require additional
parsing
> > code),
ok, i can parse the length, but how should i store it ?
I'm working on this.
> > c) properties of the alignment are stored as 'new' tags, instead of
> > through the tag system. This prevents them from easy de/serialization
> > through the gff_string()/_from_gff_string() methods. (BTW does the
string
> > returned by $bplite_hsp->homologySeq() make sense to anyone?)
>
> Talking to Lorenz - I'm not siure about this.
what properties do you mean? score, bits, P value, matching, positives
and such things? if so then hilmar is right, they are not stored through
Yes, that's what I mean. I'm working on a class that offers better
support for this, so you could inherit off that instead of
SeqFeature::Generic/FeaturePair.
why does the homologySeq make no sense?
(i just adopted it from the original BPlite...)
It only contains bars and spaces. I wouldn't know what to do with
that. (The gaps are lost, so you cannot use it for locating mismatched
bases.)
and could someone please explain to me what's the purpose of those
gff_string methods?
GFF = Generic Feature Format (used to be Gene Finding Format). I do
not know the URL by heart, but it's documented somewhere on the Sanger
site (www.sanger.ac.uk). It provides an easy ASCII exchange format, so
that the methods for de/serializing feature objects are almost there.
These alraedy take care of the tag system being de/serialized, and if
you wish to de/serialize anything else, you would have to override
these for each derived class (which is not a bad thing to do, but if I
can I'd save the work).
Hilmar