[Bioperl-l] Parsing Genbank

Chris Fields cjfields at illinois.edu
Wed Dec 2 19:39:40 UTC 2009


That one's odd; the coordinates should relate back to the original sequence.  Any chance you could pass on the sequence file so we can confirm it?  you can do this off-list if the information is sensitive, or you can create a faux sequence that has the same problem).

chris

On Dec 2, 2009, at 1:29 PM, Brandi Cantarel wrote:

> Here is some of my code, the real code actually enters the data into a database.
> 
> 
> $in  = Bio::SeqIO->new(-file => $gbkfile,
> 		       '-format' => 'genbank');
> 
> W1:while (my $seq = $in->next_seq()) {
>  my @feats = $seq->get_all_SeqFeatures();
>  my $j = 0;
> F1:foreach $cds (@feats) {
> 	next F1 unless ($cds->primary_tag() eq 'CDS');
> 	#do something with the cds start and cds end
> 	}
> }
> 	 
> 
> LOCUS       subjpool12_contig3          974 bp    DNA     linear   UNK 19-Nov-2009
> ACCESSION   subjpool12_contig3
> KEYWORDS    .
> SOURCE      human metagenome
>  ORGANISM  human metagenome
>            unclassified sequences; organismal metagenomes,metagenomes.
> FEATURES             Location/Qualifiers
>     source          1..974
>                     /mol_type="genomic DNA"
>                     /isolation_source="Homo sapiens"
>                     /organism="human metagenome"
>                     /collection_date="19-Nov-2009"
>     CDS             complement(911..974)
>                     /locus_tag="subjpool12_contig3|metagene|gene_2"
>                     /translation="IRIMTVELINPYIRHVEHST"
>                     /score="2.52804"
>                     /product="hypothetical protein"
>                     /note="score=2.52804"
>                     /note="score=2.52804"
>                     /note="frame=1"
> ORIGIN
> #some sequence….
> 
> 
> 
> 
>> From this example, I would like to get the coordinates 911 and 974, rather than 1 and 64.
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~
> Brandi Cantarel, PhD
> Bioinformatics Analyst
> Institute for Genome Sciences
> School of Medicine
> University of Maryland, Baltimore
> 
> On Dec 2, 2009, at 2:09 PM, Mark A. Jensen wrote:
> 
>> Hi Brandi-
>> If $cds is a Bio::SeqFeature::Generic, that's weird (I believe); if its an ordinary Bio::Seq, that's normal.
>> Can you elaborate by posting your code?
>> cheers,
>> MAJ
>> ----- Original Message ----- From: "Brandi Cantarel" <bcantarel at som.umaryland.edu>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, December 02, 2009 1:36 PM
>> Subject: [Bioperl-l] Parsing Genbank
>> 
>> 
>>> Hi all,
>>> I am not sure if this is normal, but when I use SEQIO to parse genbank files, it changes the coordinates of things on the minus strand.
>>> 
>>> 
>>> For example, I have a sequence that has a CDS on the minus strand at it is from 911 to 974.  The sequence is 974 nt.
>>> 
>>> x $cds->start
>>> 1
>>> x $cds->end
>>> 64
>>> 
>>> How can I get the original coordinates?  Is there a command for that or will I have to just do the math?
>>> 
>>> Feature or Bug?
>>> 
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~
>>> Brandi Cantarel, PhD
>>> Bioinformatics Analyst
>>> Institute for Genome Sciences
>>> School of Medicine
>>> University of Maryland, Baltimore
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list