[Bioperl-l] simple PrimarySeq question
Hilmar Lapp
hlapp at gmx.net
Tue Jul 3 02:36:19 UTC 2007
Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have
examples for what you want to do:
use Bio::SeqIO;
# usually you won't instantiate this yourself - a SeqIO object -
# you will have one already
my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
my $builder = $seqin->sequence_builder();
# if you need only sequence, id, and description (e.g. for
# conversion to FASTA format):
$builder->want_none();
$builder->add_wanted_slot('display_id','desc','seq');
# if you want everything except the sequence and features
$builder->want_all(1); # this is the default if it's untouched
$builder->add_unwanted_slot('seq','features');
Let us know if that doesn't answer your question.
Note that this is currently only implemented for Genbank format.
-hilmar
On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:
> Kevin,
>
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
>
> Niels
>
>
>
>
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>>
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>>
>> As for how to parse a Genbank file into a list of features:
>>
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> @features = $seq->all_SeqFeatures;
>> # sort features by their primary tags
>> for my $f (@features)
>> {
>> my $tag = $f->primary_tag;
>> if ($tag eq 'CDS')
>> {
>> # @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> push @sorted_features, $f;
>> }
>> }
>> }
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>>
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>>
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>>
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>>
>>> Niels L
>>>
>>>
>>> #!/usr/bin/env perl
>>>
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>>
>>> use Data::Dumper;
>>>
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>>
>>> my ( $seq_h, $seq );
>>>
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>>
>>> $seq = $seq_h->next_seq();
>>>
>>> # print Dumper( $seq );
>>>
>>> __END__
>>>
>>> LOCUS X60065 9 bp mRNA linear
>>> MAM 14-NOV-2006
>>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION X60065 REGION: 1..9
>>> VERSION X60065.1 GI:5
>>> KEYWORDS beta-2 glycoprotein I.
>>> SOURCE Bos taurus (cattle)
>>> ORGANISM Bos taurus
>>> Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>> Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>> Pecora; Bovidae; Bovinae; Bos.
>>> REFERENCE 1
>>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>> Kristensen,T.
>>> TITLE Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>> localization of the disulfide bridges
>>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992)
>>> PUBMED 1567819
>>> REFERENCE 2 (bases 1 to 9)
>>> AUTHORS Kristensen,T.
>>> TITLE Direct Submission
>>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>> University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>> DENMARK
>>> FEATURES Location/Qualifiers
>>> source 1..9
>>> /organism="Bos taurus"
>>> /mol_type="mRNA"
>>> /db_xref="taxon:9913"
>>> /clone="pBB2I"
>>> /tissue_type="liver"
>>> gene <1..>9
>>> /gene="beta-2-gpI"
>>> CDS <1..>9
>>> /gene="beta-2-gpI"
>>> /codon_start=1
>>> /product="beta-2-glycoprotein I"
>>> /protein_id="CAA42669.1"
>>> /db_xref="GI:6"
>>> /db_xref="GOA:P17690"
>>> /db_xref="UniProtKB/Swiss-Prot:P17690"
>>>
>>> /translation="PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQI
>>>
>>> VFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNT
>>>
>>> ISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCPPPPIPKFASLSVYKPLAGNN
>>>
>>> SFYGSKAVFKCLPHHAMFGNDTVTCTEHGNWTQLPECREVRCPFPSRPDNGFVNHPAN
>>>
>>> PVLYYKDTATFGCHETYSLDGPEEVECSKFGNWSAQPSCKASCKLSIKRATVIYEGER
>>>
>>> VAIQNKFKNGMLHGQKVSFFCKHKEKKCSYTEDAQCIDGTIEIPKCFKEHSSLAFWKT
>>> DASDVKPC"
>>> sig_peptide <1..>9
>>> /gene="beta-2-gpI"
>>> ORIGIN
>>> 1 ccagcgctc
>>> //
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list