[Biojava-l] BiojavaX EmblFormat
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Thu Mar 9 01:02:09 UTC 2006
The biojavax parser uses regular expressions to parse these lines. I will
need to check what needs changing in these regex's to allow parsing of
these files.
Thanks for your testing!
- Mark
"Jolyon Holdstock" <jolyon.holdstock at ogt.co.uk>
Sent by: biojava-l-bounces at portal.open-bio.org
03/08/2006 06:47 PM
To: <biojava-l at biojava.org>
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] BiojavaX EmblFormat
Hi,
I am using the new format parsers in BioJavaX. GenbankFormat is great,
but I am having some trouble with the EMBLFormat class. I have
downloaded a sequence file (ID:U00096) from the EBI in EMBL format but I
don't believe it is parsing properly.
My code is as follows:
String fileName = "path to file";
try {
RichSequenceIterator rsi = RichSequence.IOTools.readEMBLDNA(new
BufferedReader(new FileReader(fileName)), null);
while (rsi.hasNext()) {
RichSequence seq = rsi.nextRichSequence();
System.out.println(seq.getURN());
System.out.println(seq.length());
System.out.println(seq.getAccession());
}
}
catch (IOException IOE) {
System.out.println("BioJava IOException " + IOE);
}
catch (BioException BIOE) {
System.out.println("BioJavaX BioException " + BIOE);
BIOE.printStackTrace();
}
The BioJava parser will read it.
seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence(); //works
I checked the web CVS and the EMBLFormat class is 3 months old so I am
using the most recent version.
I have pasted a snippet of the sequence file that retains the problems
below.
The errors are:
The ID line isn't parsed because of 'genomic' being there - deleting it
removes the problem
org.biojava.bio.BioException: Could not read sequence
Caused by: org.biojava.bio.seq.io.ParseException:
Bad ID line found: U00096 standard; circular genomic DNA; PRO;
4639675 BP.
ID U00096 standard; circular genomic DNA; PRO; 4639675 BP. //fails
ID U00096 standard; circular DNA; PRO; 4639675 BP. //works
There is a problem with the RX tag which fails with output:
org.biojava.bio.BioException: Could not read sequence
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:352)
Replacing
RX DOI; 10.1126/science.277.5331.1453.
with removes the error
XX RX DOI; 10.1126/science.277.5331.1453.
There is an error with parsing the authors
org.biojava.bio.BioException: Could not read sequence
Caused by: java.lang.IllegalArgumentException: Authors string cannot be
null
at
org.biojavax.DocRefAuthor$Tools.parseAuthorString(DocRefAuthor.java:75)
at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:395)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamRead
er.java:100)
I am looking at the code trying to see where the problems are but
suspect that it may be beyond me.
So if anybody has some experience of this I would welcome their input.
Thanks,
Jolyon
This is a snippet of the code that reproduces the errors in my hands.
ID U00096 standard; circular genomic DNA; PRO; 4639675 BP.
XX
AC U00096; AE000111-AE000510;
XX
SV U00096.2
XX
DT 23-FEB-2006 (Rel. 86, Created)
DT 06-MAR-2006 (Rel. 87, Last updated, Version 3)
XX
DE Escherichia coli K-12 MG1655, complete genome.
XX
KW .
XX
OS Escherichia coli K12
OC Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
OC Enterobacteriaceae; Escherichia.
XX
RN [1]
RP 1-4639675
RX DOI; 10.1126/science.277.5331.1453.
RX PUBMED; 9278503.
RA Blattner F.R., Plunkett G., Bloch C.A., Perna N.T., Burland V.,
Riley M.,
RA Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J.,
RA Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao
Y.;
RT "The complete genome sequence of Escherichia coli K-12";
RL Science 277(5331):1453-1474(1997).
XX
RN [2]
RP 1-4639675
RX DOI; 10.1093/nar/gkj150.
RX PUBMED; 16397293.
RA Riley M., Abe T., Arnaud M.B., Berlyn M.K., Blattner F.R.,
Chaudhuri R.R.,
RA Glasner J.D., Horiuchi T., Keseler I.M., Kosuge T., Mori H., Perna
N.T.,
RA Plunkett G. III, Rudd K.E., Serres M.H., Thomas G.H., Thomson N.R.,
RA Wishart D., Wanner B.L.;
RT "Escherichia coli K-12: a cooperatively developed annotation
RT snapshot--2005";
RL (er) Nucleic Acids Res. 34 (1), 1-9 (2006)
XX
RN [3]
RC Woods Hole, Mass., on 14-18 November 2003 (sequence corrections)
RP 1-4639675
RA Arnaud M., Berlyn M.K.B., Blattner F.R., Galperin M.Y., Glasner
J.D.,
RA Horiuchi T., Kosuge T., Mori H., Perna N.T., Plunkett G. III, Riley
M.,
RA Rudd K.E., Serres M.H., Thomas G.H., Wanner B.L.;
RT "Workshop on Annotation of Escherichia coli K-12";
RL Unpublished.
XX
RN [4]
RC ASAP download 10 June 2004 (annotation updates)
RP 1-4639675
RA Glasner J.D., Perna N.T., Plunkett G. III, Anderson B.D., Bockhorst
J.,
RA Hu J.C., Riley M., Rudd K.E., Serres M.H.;
RT "ASAP: Escherichia coli K-12 strain MG1655 version m56";
RL Unpublished.
XX
RN [5]
RC GenBank accessions AG613214 to AG613378 (sequence corrections)
RP 1-4639675
RA Hayashi K., Morooka N., Mori H., Horiuchi T.;
RT "A more accurate sequence comparison between genomes of Escherichia
coli
RT K12 W3110 and MG1655 strains";
RL Unpublished.
XX
RN [6]
RC GenBank accession AY605712 (sequence corrections)
RP 1-4639675
RA Perna N.T.;
RT "Escherichia coli K-12 MG1655 yqiK-rfaE intergenic region, genomic
sequence
RT correction";
RL Unpublished.
XX
RN [7]
RP 1-4639675
RA Rudd K.E.;
RT "A manual approach to accurate translation start site annotation:
an E.
RT coli K-12 case study";
RL Unpublished.
XX
RN [8]
RP 1-4639675
RA Blattner F.R., Plunkett G. III.;
RT ;
RL Submitted (16-JAN-1997) to the EMBL/GenBank/DDBJ databases.
RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,
RL WI 53706-1580, USA
XX
RN [9]
RP 1-4639675
RA Blattner F.R., Plunkett G. III.;
RT ;
RL Submitted (02-SEP-1997) to the EMBL/GenBank/DDBJ databases.
RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,
RL WI 53706-1580, USA
XX
RN [10]
RP 1-4639675
RA Plunkett G. III.;
RT ;
RL Submitted (13-OCT-1998) to the EMBL/GenBank/DDBJ databases.
RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,
RL WI 53706-1580, USA
XX
RN [11]
RC Sequence update by submitter
RP 1-4639675
RA Plunkett G. III.;
RT ;
RL Submitted (10-JUN-2004) to the EMBL/GenBank/DDBJ databases.
RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,
RL WI 53706-1580, USA
XX
RN [12]
RC Protein updates by submitter
RP 1-4639675
RA Plunkett G. III.;
RT ;
RL Submitted (07-FEB-2006) to the EMBL/GenBank/DDBJ databases.
RL Laboratory of Genetics, University of Wisconsin, 425G Henry Mall,
Madison,
RL WI 53706-1580, USA
XX
DR EMBL-TPA; BR000242.
XX
FH Key Location/Qualifiers
FH
FT source 1..4639675
FT /organism="Escherichia coli K12"
FT /strain="K-12"
FT /sub_strain="MG1655"
FT /mol_type="genomic DNA"
FT /db_xref="taxon:83333"
FT gene 190..255
FT /gene="thrL"
FT /locus_tag="b0001"
FT /note="synonyms: ECK0001, JW4367"
FT CDS 190..255
FT /codon_start=1
FT /transl_table=11
FT /gene="thrL"
FT /locus_tag="b0001"
FT /product="thr operon leader peptide"
FT /function="1.5.1.8 metabolism; building block
biosynthesis;
FT amino acids; threonine"
FT /function="leader; Amino acid biosynthesis:
Threonine"
FT /note="go_process: threonine biosynthesis [goid
0009088]"
FT /protein_id="AAC73112.1"
FT /translation="MKRISTTITTTITITTGNGAG"
FT gene 337..2799
FT /gene="thrA"
FT /locus_tag="b0002"
FT /note="synonyms: Hs, thrD, ECK0002, JW0001"
FT CDS 337..2799
FT /codon_start=1
FT /transl_table=11
FT /gene="thrA"
FT /locus_tag="b0002"
FT /product="fused aspartokinase I and homoserine
FT dehydrogenase I"
FT /function="1.5.1.21 metabolism; building block
FT biosynthesis; amino acids; homoserine"
FT /function="1.5.1.8 metabolism; building block
biosynthesis;
FT amino acids; threonine"
FT /function="7.1 location of gene products;
cytoplasm"
FT /function="enzyme; Amino acid biosynthesis:
Threonine"
FT /EC_number="1.1.1.3"
FT /EC_number="2.7.2.4"
FT /note="bifunctional: aspartokinase I (N-terminal);
FT homoserine dehydrogenase I (C-terminal);
go_component:
FT cytoplasm [goid 0005737]; go_process: threonine
FT biosynthesis [goid 0009088]; go_process: homoserine
FT biosynthesis [goid 0009090]"
FT /protein_id="AAC73113.1"
FT
/translation="MRVLKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITN
FT
HLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKHV
FT
LHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLES
FT
TVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLGRNGSDYSAAVLAACLRADC
FT
CEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCL
FT
IKNTGNPQAPGTLIGASRDEDELPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMS
FT
RARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAII
FT
SVVGDGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQM
FT
LFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGVANSKALLTNVHGLN
FT
LENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVT
FT
PNKKANTSSMDYYHQLRYAAEKSRRKFLYDTNVGAGLPVIENLQNLLNAGDELMKFSGI
FT
LSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGREL
FT
ELADIEIEPVLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDG
FT
VCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTAAGVFADLLR
FT TLSWKLGV"
FT gene 2801..3733
FT /gene="thrB"
FT /locus_tag="b0003"
FT /note="synonyms: ECK0003, JW0002"
FT CDS 2801..3733
FT /codon_start=1
FT /transl_table=11
FT /gene="thrB"
FT /locus_tag="b0003"
FT /product="homoserine kinase"
FT /function="1.5.1.8 metabolism; building block
biosynthesis;
FT amino acids; threonine"
FT /function="7.1 location of gene products;
cytoplasm"
FT /function="enzyme; Amino acid biosynthesis:
Threonine"
FT /EC_number="2.7.1.39"
FT /note="go_component: cytoplasm [goid 0005737];
go_process:
FT threonine biosynthesis [goid 0009088]"
FT /protein_id="AAC73114.1"
FT
/translation="MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFS
FT
LNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVV
FT
AALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQ
FT
QVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGFIHACYSRQPELA
FT
AKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVAD
FT WLGKNYLQNQEGFVHICRLDTAGARVLEN"
FT gene 3734..5020
FT /gene="thrC"
FT /locus_tag="b0004"
FT /note="synonyms: ECK0004, JW0003"
FT CDS 3734..5020
FT /codon_start=1
FT /transl_table=11
FT /gene="thrC"
FT /locus_tag="b0004"
FT /product="threonine synthase"
FT /function="1.5.1.8 metabolism; building block
biosynthesis;
FT amino acids; threonine"
FT /function="7.1 location of gene products;
cytoplasm"
FT /function="enzyme; Amino acid biosynthesis:
Threonine"
FT /EC_number="4.2.3.1"
FT /note="go_component: cytoplasm [goid 0005737];
go_process:
FT threonine biosynthesis [goid 0009088]"
FT /protein_id="AAC73115.1"
FT
/translation="MKLYNLKDHNEQVSFAQAVTQGLGKNQGLFFPHDLPEFSLTEIDE
FT
MLKLDFVTRSAKILSAFIGDEIPQEILEERVRAAFAFPAPVANVESDVGCLELFHGPTL
FT
AFKDFGGRFMAQMLTHIAGDKPVTILTATSGDTGAAVAHAFYGLPNVKVVILYPRGKIS
FT
PLQEKLFCTLGGNIETVAIDGDFDACQALVKQAFDDEELKVALGLNSANSINISRLLAQ
FT
ICYYFEAVAQLPQETRNQLVVSVPSGNFGDLTAGLLAKSLGLPVKRFIAATNVNDTVPR
FT
FLHDGQWSPKATQATLSNAMDVSQPNNWPRVEELFRRKIWQLKELGYAAVDDETTQQTM
FT
RELKELGYTSEPHAAVAYRALRDQLNPGEYGLFLGTAHPAKFKESVEAILGETLDLPKE
FT LAERADLPLLSHNLPADFAALRKLMMNHQ"
XX
SQ Sequence 4639675 BP; 1142228 A; 1179554 C; 1176923 G; 1140970 T; 0
other;
agcttttcat tctgactgca acgggcaata tgtctctgtg tggattaaaa aaagagtgtc
60
tgatagcagc ttctgaactg gttacctgcc gtgagtaaat taaaatttta ttgacttagg
120
tcactaaata ctttaaccaa tataggcata gcgcacagac agataaaaat tacagagtac
180
acaacatcca tgaaacgcat tagcaccacc attaccacca ccatcaccat taccacaggt
240
aacggtgcgg gctgacgcgt acaggaaaca cagaaaaaag cccgcacctg acagtgcggg
300
cttttttttt cgaccaaagg taacgaggta acaaccatgc gagtgttgaa gttcggcggt
360
acatcagtgg caaatgcaga acgttttctg cgtgttgccg atattctgga aagcaatgcc
420
aggcaggggc aggtggccac cgtcctctct gcccccgcca aaatcaccaa ccacctggtg
480
gcgatgattg aaaaaaccat tagcggccag gatgctttac ccaatatcag cgatgccgaa
540
cgtatttttg ccgaactttt gacgggactc gccgccgccc agccggggtt cccgctggcg
600
caattgaaaa ctttcgtcga tcaggaattt gcccaaataa aacatgtcct gcatggcatt
660
agtttgttgg ggcagtgccc ggatagcatc aacgctgcgc tgatttgccg tggcgagaaa
720
atgtcgatcg ccattatggc cggcgtatta gaagcgcgcg gtcacaacgt tactgttatc
780
gatccggtcg aaaaactgct ggcagtgggg cattacctcg aatctaccgt cgatattgct
840
gagtccaccc gccgtattgc ggcaagccgc attccggctg atcacatggt gctgatggca
900
ggtttcaccg ccggtaatga aaaaggcgaa ctggtggtgc ttggacgcaa cggttccgac
960
tactctgctg cggtgctggc tgcctgttta cgcgccgatt gttgcgagat ttggacggac
1020
gttgacgggg tctatacctg cgacccgcgt caggtgcccg atgcgaggtt gttgaagtcg
1080
atgtcctacc aggaagcgat ggagctttcc tacttcggcg ctaaagttct tcacccccgc
1140
accattaccc ccatcgccca gttccagatc ccttgcctga ttaaaaatac cggaaatcct
1200
caagcaccag gtacgctcat tggtgccagc cgtgatgaag acgaattacc ggtcaagggc
1260
atttccaatc tgaataacat ggcaatgttc agcgtttctg gtccggggat gaaagggatg
1320
gtcggcatgg cggcgcgcgt ctttgcagcg atgtcacgcg cccgtatttc cgtggtgctg
1380
attacgcaat catcttccga atacagcatc agtttctgcg ttccacaaag cgactgtgtg
1440
cgagctgaac gggcaatgca ggaagagttc tacctggaac tgaaagaagg cttactggag
1500
ccgctggcag tgacggaacg gctggccatt atctcggtgg taggtgatgg tatgcgcacc
1560
ttgcgtggga tctcggcgaa attctttgcc gcactggccc gcgccaatat caacattgtc
1620
gccattgctc agggatcttc tgaacgctca atctctgtcg tggtaaataa cgatgatgcg
1680
accactggcg tgcgcgttac tcatcagatg ctgttcaata ccgatcaggt tatcgaagtg
1740
tttgtgattg gcgtcggtgg cgttggcggt gcgctgctgg agcaactgaa gcgtcagcaa
1800
agctggctga agaataaaca tatcgactta cgtgtctgcg gtgttgccaa ctcgaaggct
1860
ctgctcacca atgtacatgg ccttaatctg gaaaactggc aggaagaact ggcgcaagcc
1920
aaagagccgt ttaatctcgg gcgcttaatt cgcctcgtga aagaatatca tctgctgaac
1980
ccggtcattg ttgactgcac ttccagccag gcagtggcgg atcaatatgc cgacttcctg
2040
cgcgaaggtt tccacgttgt cacgccgaac aaaaaggcca acacctcgtc gatggattac
2100
taccatcagt tgcgttatgc ggcggaaaaa tcgcggcgta aattcctcta tgacaccaac
2160
gttggggctg gattaccggt tattgagaac ctgcaaaatc tgctcaatgc aggtgatgaa
2220
ttgatgaagt tctccggcat tctttctggt tcgctttctt atatcttcgg caagttagac
2280
gaaggcatga gtttctccga ggcgaccacg ctggcgcggg aaatgggtta taccgaaccg
2340
gacccgcgag atgatctttc tggtatggat gtggcgcgta aactattgat tctcgctcgt
2400
gaaacgggac gtgaactgga gctggcggat attgaaattg aacctgtgct gcccgcagag
2460
tttaacgccg agggtgatgt tgccgctttt atggcgaatc tgtcacaact cgacgatctc
2520
tttgccgcgc gcgtggcgaa ggcccgtgat gaaggaaaag ttttgcgcta tgttggcaat
2580
attgatgaag atggcgtctg ccgcgtgaag attgccgaag tggatggtaa tgatccgctg
2640
ttcaaagtga aaaatggcga aaacgccctg gccttctata gccactatta tcagccgctg
2700
ccgttggtac tgcgcggata tggtgcgggc aatgacgtta cagctgccgg tgtctttgct
2760
gatctgctac gtaccctctc atggaagtta ggagtctgac atggttaaag tttatgcccc
2820
ggcttccagt gccaatatga gcgtcgggtt tgatgtgctc ggggcggcgg tgacacctgt
2880
tgatggtgca ttgctcggag atgtagtcac ggttgaggcg gcagagacat tcagtctcaa
2940
caacctcgga cgctttgccg ataagctgcc gtcagaacca cgggaaaata tcgtttatca
3000
gtgctgggag cgtttttgcc aggaactggg taagcaaatt ccagtggcga tgaccctgga
3060
aaagaatatg ccgatcggtt cgggcttagg ctccagtgcc tgttcggtgg tcgcggcgct
3120
gatggcgatg aatgaacact gcggcaagcc gcttaatgac actcgtttgc tggctttgat
3180
gggcgagctg gaaggccgta tctccggcag cattcattac gacaacgtgg caccgtgttt
3240
tctcggtggt atgcagttga tgatcgaaga aaacgacatc atcagccagc aagtgccagg
3300
gtttgatgag tggctgtggg tgctggcgta tccggggatt aaagtctcga cggcagaagc
3360
cagggctatt ttaccggcgc agtatcgccg ccaggattgc attgcgcacg ggcgacatct
3420
ggcaggcttc attcacgcct gctattcccg tcagcctgag cttgccgcga agctgatgaa
3480
agatgttatc gctgaaccct accgtgaacg gttactgcca ggcttccggc aggcgcggca
3540
ggcggtcgcg gaaatcggcg cggtagcgag cggtatctcc ggctccggcc cgaccttgtt
3600
cgctctgtgt gacaagccgg aaaccgccca gcgcgttgcc gactggttgg gtaagaacta
3660
cctgcaaaat caggaaggtt ttgttcatat ttgccggctg gatacggcgg gcgcacgagt
3720
actggaaaac taaatgaaac tctacaatct gaaagatcac aacgagcagg tcagctttgc
3780
gcaagccgta acccaggggt tgggcaaaaa tcaggggctg ttttttccgc acgacctgcc
3840
ggaattcagc ctgactgaaa ttgatgagat gctgaagctg gattttgtca cccgcagtgc
3900
gaagatcctc tcggcgttta ttggtgatga aatcccacag gaaatcctgg aagagcgcgt
3960
gcgcgcggcg tttgccttcc cggctccggt cgccaatgtt gaaagcgatg tcggttgtct
4020
ggaattgttc cacgggccaa cgctggcatt taaagatttc ggcggtcgct ttatggcaca
4080
aatgctgacc catattgcgg gtgataagcc agtgaccatt ctgaccgcga cctccggtga
4140
taccggagcg gcagtggctc atgctttcta cggtttaccg aatgtgaaag tggttatcct
4200
ctatccacga ggcaaaatca gtccactgca agaaaaactg ttctgtacat tgggcggcaa
4260
tatcgaaact gttgccatcg acggcgattt cgatgcctgt caggcgctgg tgaagcaggc
4320
gtttgatgat gaagaactga aagtggcgct agggttaaac tcggctaact cgattaacat
4380
cagccgtttg ctggcgcaga tttgctacta ctttgaagct gttgcgcagc tgccgcagga
4440
gacgcgcaac cagctggttg tctcggtgcc aagcggaaac ttcggcgatt tgacggcggg
4500
tctgctggcg aagtcactcg gtctgccggt gaaacgtttt attgctgcga ccaacgtgaa
4560
cgataccgtg ccacgtttcc tgcacgacgg tcagtggtca cccaaagcga ctcaggcgac
4620
gttatccaac gcgatggacg tgagtcagcc gaacaactgg ccgcgtgtgg aagagttgtt
4680
ccgccgcaaa atctggcaac tgaaagagct gggttatgca gccgtggatg atgaaaccac
4740
gcaacagaca atgcgtgagt taaaagaact gggctacact tcggagccgc acgctgccgt
4800
agcttatcgt gcgctgcgtg atcagttgaa tccaggcgaa tatggcttgt tcctcggcac
4860
cgcgcatccg gcgaaattta aagagagcgt ggaagcgatt ctcggtgaaa cgttggatct
4920
gccaaaagag ctggcagaac gtgctgattt acccttgctt tcacataatc tgcccgccga
4980
ttttgctgcg ttgcgtaaat tgatgatgaa tcatcagtaa aatctattca ttatctcaat
5040
caggccgggt ttgcttttat gcagcccggc ttttttatga agaaattatg gagaaaaatg
5100
acagggaaaa aggagaaatt ctcaataaat gcggtaactt agagattagg attgcggaga
5160
ataacaaccg ccgttctcat cgagtaatct ccggatatcg acccataacg ggcaatgata
5220
aaaggagtaa cctgtgaaaa agatgcaatc tatcgtactc gcactttccc tggttctggt
5280
cgctcccatg gcagcacagg ctgcggaaat tacgttagtc ccgtcagtaa aattacagat
5340
aggcgatcgt gataatcgtg gctattactg ggatggaggt cactggcgcg accacggctg
5400
gtggaaacaa cattatgaat ggcgaggcaa tcgctggcac ctacacggac cgccgccacc
5460
gccgcgccac cataagaaag ctcctcatga tcatcacggc ggtcatggtc caggcaaaca
5520
tcaccgctaa atgacaaatg ccgggtaaca atccggcatt cagcgcctga tgcgacgctg
5580
gcgcgtctta tcaggcctac gttaattctg caatatattg aatctgcatg cttttgtagg
5640
caggataagg cgttcacgcc gcatccggca ttgactgcaa acttaacgct gctcgtagcg
5700
tttaaacacc agttcgccat tgctggagga atcttcatca aagaagtaac cttcgctatt
5760
aaaaccagtc agttgctctg gtttggtcag ccgattttca ataatgaaac gactcatcag
5820
accgcgtgct ttcttagcgt agaagctgat gatcttaaat ttgccgttct tctcatcgag
5880
gaacaccggc ttgataatct cggcattcaa tttcttcggc ttcaccgatt taaaatactc
5940
atctgacgcc agattaatca ccacattatc gccttgtgct gcgagcgcct cgttcagctt
6000
gttggtgatg atatctcccc agaattgata cagatctttc cctcgggcat tctcaagacg
6060
gatccccatt tccagacgat aaggctgcat taaatcgagc gggcggagta cgccatacaa
6120
gccggaaagc attcgcaaat gctgttgggc aaaatcgaaa tcgtcttcgc tgaaggtttc
6180
ggcctgcaag ccggtgtaga catcaccttt aaacgccaga atcgcctggc gggcattcgc
6240
cggcgtgaaa tctggctgcc agtcatgaaa gcgagcggcg ttgatacccg ccagtttgtc
6300
gctgatgcgc atcagcgtgc taatctgcgg aggcgtcagt ttccgcgcct catggatcaa
6360
ctgctgggaa ttgtctaaca gctccggcag cgtatagcgc gtggtggtca acgggctttg
6420
gtaatcaagc gttttcgcag gtgaaataag aatcagcata tccagtcctt gcaggaaatt
6480
tatgccgact ttagcaaaaa atgagaatga gttgatcgat agttgtgatt actcctgcga
6540
aacatcatcc cacgcgtccg gagaaagctg gcgaccgata tccggataac gcaatggatc
6600
aaacaccggg cgcacgccga gtttacgctg gcgtagataa tcactggcaa tggtatgaac
6660
cacaggcgag agcagtaaaa tggcggtcaa attggtaata gccatgcagg ccattatgat
6720
atctgccagt tgccacatca gcggaaggct tagcaaggtg ccgccgatga ccgttgcgaa
6780
ggtgcagatc cgcaaacacc agatcgcttt agggttgttc aggcgtaaaa agaagagatt
6840
gttttcggca taaatgtagt tggcaacgat ggagctgaag gcaaacagaa taaccacaag
6900
ggtaacaaac tcagcacccc aggaacccat tagcacccgc atcgccttct ggataagctg
6960
aataccttcc agcggcatgt aggttgtgcc gttacccgcc agtaatatca gcatggcgct
7020
tgccgtacag atgaccaggg tgtcgataaa aatgccaatc atctggacaa tcccttgcgc
7080
tgccggatgc ggaggccagg acgccgctgc cgctgccgcg tttggcgtcg aacccattcc
7140
cgcctcattg gaaaacatac tgcgctgaaa accgttagta atcgcctggc ttaaggtata
7200
tcccgccgcg ccgcctgccg cttcctgcca gccaaaagca ctctcaaaaa tagaccaaat
7260
gacgtgggga agttgcccga tattcattac gcaaattacc aggctggtca gtacccagat
7320
tatcgccatc aacgggacaa agccctgcat gagccgggcg acgccatgaa gaccgcgagt
7380
gattgccagc agagtaaaga cagcgagaat aatgcctgtc accagcgggg gaaaatcaaa
7440
agaaaaactc agggcgcggg caacggcgtt cgcttgaact ccgctgaaaa ttatgccata
7500
ggcgatgagc aaaaagacgg cgaacagaac gcccatccag cgcatcccca gcccgcgcgc
7560
catataccat gccggtccgc cacgaaactg cccattgacg tcacgttctt tataaagttg
7620
tgccagagaa cattcggcaa acgaggtcgc catgccgata aacgcggcaa cccacatcca
7680
aaagacggct ccaggtccac cggcggtaat agccagcgca acgccggcca ggttgccgct
7740
acccacgcgc gccgcaagac tggtacacaa tgactgaaat gaggttaaac cgcctggctg
7800
tggatgaatg ctatttttaa gacttttgcc aaactggcgg atgtagcgaa actgcacaaa
7860
tccggtgcga aaagtgaacc aacaacctgc gccgaagagc aggtaaatca ttaccgatcc
7920
ccaaaggacg ctgttaatga aggagaaaaa atctggcatg catatccctc ttattgccgg
7980
tcgcgatgac tttcctgtgt aaacgttacc aattgtttaa gaagtatata cgctacgagg
8040
tacttgataa cttctgcgta gcatacatga ggttttgtat aaaaatggcg ggcgatatca
8100
acgcagtgtc agaaatccga aacagtctcg cctggcgata accgtcttgt cggcggttgc
8160
gctgacgttg cgtcgtgata tcatcagggc agaccggtta catcccccta acaagctgtt
8220
taaagagaaa tactatcatg acggacaaat tgacctccct tcgtcagtac accaccgtag
8280
tggccgacac tggggacatc gcggcaatga agctgtatca accgcaggat gccacaacca
8340
acccttctct cattcttaac gcagcgcaga ttccggaata ccgtaagttg attgatgatg
8400
ctgtcgcctg ggcgaaacag cagagcaacg atcgcgcgca gcagatcgtg gacgcgaccg
8460
acaaactggc agtaaatatt ggtctggaaa tcctgaaact ggttccgggc cgtatctcaa
8520
ctgaagttga tgcgcgtctt tcctatgaca ccgaagcgtc aattgcgaaa gcaaaacgcc
8580
tgatcaaact ctacaacgat gctggtatta gcaacgatcg tattctgatc aaactggctt
8640
ctacctggca gggtatccgt gctgcagaac agctggaaaa agaaggcatc aactgtaacc
8700
tgaccctgct gttctccttc gctcaggctc gtgcttgtgc ggaagcgggc gtgttcctga
8760
tctcgccgtt tgttggccgt attcttgact ggtacaaagc gaataccgat aagaaagagt
8820
acgctccggc agaagatccg ggcgtggttt ctgtatctga aatctaccag tactacaaag
8880
agcacggtta tgaaaccgtg gttatgggcg caagcttccg taacatcggc gaaattctgg
8940
aactggcagg ctgcgaccgt ctgaccatcg caccggcact gctgaaagag ctggcggaga
9000
//
Jolyon Holdstock Ph.D.
Senior Computational Biologist,
Oxford Gene Technology (Ops) Ltd.
Begbroke Business and Science Park
Sandy Lane, Yarnton
Oxford, OX5 1PF
Tel: 01865 309699
Fax: 01865 842116
Confidentiality Notice:
The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list