[Bioperl-l] parsing coded_by subfeature

Jack Chen chenn at cshl.edu
Sun Jul 20 18:56:04 EDT 2003


Hi Jason,

I think I was confused by the fact that the protein sequence provided by
the GenBank does not match that conceptually translated sequence. Say,
most of the nucleotide suquences (after joining together) are actually
longer than the protein sequences. 

Thanks,

Jack

++++++++++++++++++++++++++++++++++++++++++++
    o-o     Jack Chen, Stein Laboratory      
    o---o   Cold Spring Harbor Laboratory    
  o----o    #5 Williams, 1 Bungtown Road           
 O----O     Cold Spring Harbor, NY, 11724   
 0--o       Tel: 1 516 367 8394              
   O        e-mail: chenn at cshl.org  	    
  o-o       Website: http://www.wormbase.org
+++++++++++++++++++++++++++++++++++++++++++++
        


On Sun, 20 Jul 2003, Jason Stajich wrote:

> Bio::Factory::FTLocationFactory should parse these strings fine.
> 
> They are "fuzzy" locations - see GenBank release notes and the
> Feature Table definition:
> http://www.ncbi.nlm.nih.gov/projects/collab/FT/
> 
> 
> On Sun, 20 Jul 2003, Jack Chen wrote:
> 
> > Thanks Jason! I have looked into the script before but it does not
> > actually handle the join cases though.
> >
> > I am also curious how to handle the cases where the 'coded_by' subfeature
> > contains the ">" and "<" signs. I am not really sure what they mean. And I
> > noticed that wherever these signs appear, the protein sequences retrieved
> > are different from the conceptual translation from the nucleotide
> > sequences. For example:
> >
> > [nchen at whey blast_db_checked]$ ./test.pl "gi|8573628|gb|AAF77462.1|"
> > Protein obtained from GenBank:
> > MPQMAPISWLLLFIIFSITFILFCSINYYSYMPNSPKSNELKNINLNSMNWKW
> > CDS sequence is:
> > ATCCCACAAATAGCACCAATTAGATGATTATTACTATTTATTATTTTTTCTATTACATTTATTTTATTTTGTTCTATTAATTATTATTCTTATATGCCAAATTCACCTAAATCTAATGAATTAAAAAACATCAATTTAAATTCAATAAACTGAAAATGATAA
> > Conceptual translation is:
> > IPQIAPIR*LLLFIIFSITFILFCSINYYSYMPNSPKSNELKNINLNSIN*K**
> >
> > Jack
> >
> > ++++++++++++++++++++++++++++++++++++++++++++
> >     o-o     Jack Chen, Stein Laboratory
> >     o---o   Cold Spring Harbor Laboratory
> >   o----o    #5 Williams, 1 Bungtown Road
> >  O----O     Cold Spring Harbor, NY, 11724
> >  0--o       Tel: 1 516 367 8394
> >    O        e-mail: chenn at cshl.org
> >   o-o       Website: http://www.wormbase.org
> > +++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> > On Sun, 20 Jul 2003, Jason Stajich wrote:
> >
> > > See the FAQ this question #5.4
> > > http://www.bioperl.org/Core/Latest/FAQ.html#Q5.4
> > >
> > > -jason
> > > On Sat, 19 Jul 2003, Jack Chen wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'd like to retrieve nucleotide sequence for a give protein sequence. I
> > > > know that I could do it through coded_by subfeature, which can be rather
> > > > messy. Say, it could be one of the following formats
> > > >
> > > > 	 #AF264924.1:1749..>2110
> > > >          #AF264924.1:<254..1563
> > > >          #join(AY260053.1:497..545,AY260053.1:610..3342,
> > > > AY260053.1:3409..3750,AY260053.1:3810..4511, AY260053.1:4569..4960)
> > > >
> > > > Is their a good and unified way to do it?
> > > >
> > > > Thanks
> > > >
> > > > Jack
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
> 



More information about the Bioperl-l mailing list