[BioPython] splice variants in GenBank/Entrez
Bruce Southey
bsouthey at gmail.com
Mon Jun 9 13:25:44 UTC 2008
Albert Krewinkel wrote:
> Hi Steve,
>
> On Sun, Jun 08, 2008 at 10:21:50PM -0700, C. G. wrote:
>
>> I've been using BioPython for a few projects the last
>> two months to process BLAST results but now I need to
>> take those results and determine which of them have
>> known splice variants. By "known" I mean those that
>> have annotations contained in a database that indicate
>> they have (or are) splice variants.
>>
>
> Depending on which organism you are looking at, you might want to use
> the Ensembl genome database. There is no biopython interface, but you
> can use the jython interface from their website (at least they once
> had one, I didn't check if that's still the case). Otherwise you
> might have to use perl or java packages for that.
>
> Another good resource for this is the Alternative Splicing Database:
> http://www.ebi.ac.uk/asd/
>
> Hope that helps,
>
> Albert
>
>
>
The 'ALTERNATIVE PRODUCTS' section of CC lines in a UniProt (SwissProt)
record can contain alternative splicing information. See for example,
the manual section:
**3.12.5. Syntax of the topic 'ALTERNATIVE PRODUCTS'**
http://ca.expasy.org/sprot/userman.html#CCAP
(Given below for completeness).
Bruce
Example of the CC lines and the corresponding FT lines for an entry with
alternative splicing:
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing, Alternative initiation; Named isoforms=8;
CC Comment=Additional isoforms seem to exist;
CC Name=1; Synonyms=Non-muscle isozyme;
CC IsoId=Q15746-1; Sequence=Displayed;
CC Name=2;
CC IsoId=Q15746-2; Sequence=VSP_004791;
CC Name=3A;
CC IsoId=Q15746-3; Sequence=VSP_004792, VSP_004794;
CC Name=3B;
CC IsoId=Q15746-4; Sequence=VSP_004791, VSP_004792, VSP_004794;
CC Name=4;
CC IsoId=Q15746-5; Sequence=VSP_004792, VSP_004793;
CC Name=Del-1790;
CC IsoId=Q15746-6; Sequence=VSP_004795;
CC Name=5; Synonyms=Smooth-muscle isozyme;
CC IsoId=Q15746-7; Sequence=VSP_018845;
CC Note=Produced by alternative initiation at Met-923 of isoform 1;
CC Name=6; Synonyms=Telokin;
CC IsoId=Q15746-8; Sequence=VSP_018846;
CC Note=Produced by alternative initiation at Met-1761 of isoform
CC 1. Has no catalytic activity;
...
FT VAR_SEQ 1 1760 Missing (in isoform 6).
FT /FTId=VSP_018846.
FT VAR_SEQ 1 922 Missing (in isoform 5).
FT /FTId=VSP_018845.
FT VAR_SEQ 437 506 VSGIPKPEVAWFLEGTPVRRQEGSIEVYEDAGSHYLCLLKA
FT RTRDSGTYSCTASNAQGQVSCSWTLQVER -> G (in
FT isoform 2 and isoform 3B).
FT /FTId=VSP_004791.
FT VAR_SEQ 1433 1439 DEVEVSD -> MKWRCQT (in isoform 3A,
FT isoform 3B and isoform 4).
FT /FTId=VSP_004792.
FT VAR_SEQ 1473 1545 Missing (in isoform 4).
FT /FTId=VSP_004793.
FT VAR_SEQ 1655 1705 Missing (in isoform 3A and isoform 3B).
FT /FTId=VSP_004794.
FT VAR_SEQ 1790 1790 Missing (in isoform Del-1790).
FT /FTId=VSP_004795.
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing, Alternative initiation; Named isoforms=3;
CC Comment=Isoform 1 and isoform 2 arise due to the use of two
CC alternative first exons joined to a common exon 2 at the same
CC acceptor site but in different reading frames, resulting in two
CC completely different isoforms;
CC Name=1; Synonyms=p16INK4a;
CC IsoId=O77617-1; Sequence=Displayed;
CC Name=3;
CC IsoId=O77617-2; Sequence=VSP_018701;
CC Note=Produced by alternative initiation at Met-35 of isoform 1.
CC No experimental confirmation available;
CC Name=2; Synonyms=p19ARF;
CC IsoId=O77618-1; Sequence=External;
..
FT VAR_SEQ 1 34 Missing (in isoform 3).
FT /FTId=VSP_004099.
More information about the Biopython
mailing list