[Bioperl-l] simple PrimarySeq question

Hilmar Lapp hlapp at gmx.net
Tue Jul 3 02:36:19 UTC 2007


Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have  
examples for what you want to do:

      use Bio::SeqIO;
      # usually you won't instantiate this yourself - a SeqIO object -
      # you will have one already
      my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
      my $builder = $seqin->sequence_builder();

      # if you need only sequence, id, and description (e.g. for
      # conversion to FASTA format):
      $builder->want_none();
      $builder->add_wanted_slot('display_id','desc','seq');

      # if you want everything except the sequence and features
      $builder->want_all(1); # this is the default if it's untouched
      $builder->add_unwanted_slot('seq','features');

Let us know if that doesn't answer your question.

Note that this is currently only implemented for Genbank format.

	-hilmar

On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:

> Kevin,
>
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can  
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
>
> Niels
>
>
>
>
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>>
>> As for how to parse a Genbank file into a list of features:
>>
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> 	@features = $seq->all_SeqFeatures;
>> 	# sort features by their primary tags
>> 	for my $f (@features)
>> 	{
>> 		my $tag = $f->primary_tag;
>> 		if ($tag eq 'CDS')
>> 		{
>> 			# @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> 			push @sorted_features, $f;
>> 		}
>> 	}
>> }
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>>
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>>
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>>
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>>
>>> Niels L
>>>
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>>
>>> use Data::Dumper;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>>
>>> my ( $seq_h, $seq );
>>>
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>>
>>> $seq = $seq_h->next_seq();
>>>
>>> # print Dumper( $seq );
>>>
>>> __END__
>>>
>>> LOCUS       X60065                     9 bp    mRNA    linear
>>>   MAM 14-NOV-2006
>>> DEFINITION  B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION   X60065 REGION: 1..9
>>> VERSION     X60065.1  GI:5
>>> KEYWORDS    beta-2 glycoprotein I.
>>> SOURCE      Bos taurus (cattle)
>>>    ORGANISM  Bos taurus
>>>              Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>>              Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>>              Pecora; Bovidae; Bovinae; Bos.
>>> REFERENCE   1
>>>    AUTHORS   Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>>              Kristensen,T.
>>>    TITLE     Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>>              localization of the disulfide bridges
>>>    JOURNAL   Biochemistry 31 (14), 3611-3617 (1992)
>>>     PUBMED   1567819
>>> REFERENCE   2  (bases 1 to 9)
>>>    AUTHORS   Kristensen,T.
>>>    TITLE     Direct Submission
>>>    JOURNAL   Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>>              University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>>              DENMARK
>>> FEATURES             Location/Qualifiers
>>>       source          1..9
>>>                       /organism="Bos taurus"
>>>                       /mol_type="mRNA"
>>>                       /db_xref="taxon:9913"
>>>                       /clone="pBB2I"
>>>                       /tissue_type="liver"
>>>       gene            <1..>9
>>>                       /gene="beta-2-gpI"
>>>       CDS             <1..>9
>>>                       /gene="beta-2-gpI"
>>>                       /codon_start=1
>>>                       /product="beta-2-glycoprotein I"
>>>                       /protein_id="CAA42669.1"
>>>                       /db_xref="GI:6"
>>>                       /db_xref="GOA:P17690"
>>>                       /db_xref="UniProtKB/Swiss-Prot:P17690"
>>>
>>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>>
>>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>>
>>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>>
>>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>>
>>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>>
>>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>>                       DASDVKPC"
>>>       sig_peptide     <1..>9
>>>                       /gene="beta-2-gpI"
>>> ORIGIN
>>>          1 ccagcgctc
>>> //
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the Bioperl-l mailing list