[Bioperl-l] extract feature seq when split between 2 GenBank
accessions
Jason Stajich
jason at cgt.duhs.duke.edu
Wed Aug 27 13:05:08 EDT 2003
If you are getting the seq via spliced_seq you can pass in a
Bio::DB::RandomAccessI (either a [local] Bio::Index::Fasta or [remote]
Bio::DB::GenBank, etc db handle) to the spliced_seq object.
Now I think there is a bug because spliced seq is sorting the locations
before processing on them which has been reported but not fixed
(I am really hoping for some more bugfixing developers out there folks!)
but it should work through that system once that bug is fixed.
I would just use a Bio::DB::Fasta/Bio::Index::Fasta where you have the
accessions indexed instead of reading in all the possible seqs and storing
in a hash to keep the memory requirements down. You can also use the
DB::Failover + DB::FileCache to cache local/remote calls if you need to
mix local and remote dbs.
-jason
On Wed, 27 Aug 2003, Charles Hauser wrote:
> All,
>
> I'd like to extract the CDS from genbank records and have found that in
> some instances these are distributed among >1 genbank accession (see
> below).
>
> I have a script which does fine if CDS is fully contained within 1
> accession, other than storing all accession seqs in a hash is there a
> good way to deal with these?
>
> Charles
>
>
> LOCUS AY095303S1 2375 bp DNA linear PLN 21-JAN-2003
> DEFINITION Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1)
> gene, ccs1-ac206 allele, 5'UTR and exons 1 through 6.
> ACCESSION AY095303
> VERSION AY095303.1 GI:25986619
>
> CDS join(207..330,512..825,1045..1233,1418..1798,2000..2131,
> 2253..2345,AY095304.1:6..303,AY095304.1:495..677,
> AY095304.1:863..1098)
> /gene="CCS1"
>
>
>
>
> LOCUS AY095303S2 1505 bp DNA linear PLN 21-JAN-2003
> DEFINITION Chlamydomonas reinhardtii c-type cytochrome synthesis 1 (CCS1)
> gene, ccs1-ac206 allele, exons 7, 8 and 9, 3'UTR and complete cds.
> ACCESSION AY095304
> VERSION AY095304.1 GI:25986620
> CDS join(AY095303.1:207..330,AY095303.1:512..825,
> AY095303.1:1045..1233,AY095303.1:1418..1798,
> AY095303.1:2000..2131,AY095303.1:2253..2345,6..303,
> 495..677,863..1098)
> /gene="CCS1"
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list