[Bioperl-l] BioSQL load_seqdatabase.pl -pipeline option
Hilmar Lapp
hlapp at gmx.net
Fri Nov 3 22:54:08 UTC 2006
Close. It's not the --pipeline option you want to use for this
purpose but the --seqfilter option.
For example, to retain only sequence with taxon id 9606 you would say
--seqfilter 'sub {my $s=shift->{"-species"}; return 1 unless $s;
return 1 unless $s->ncbi_taxid; return 1 if $s->ncbi_taxid == 9606;
return 0;}'
Note that when formulating the conditions upon which to accept or
reject the object you need to take into account that the closure may
be called multiple times for one object, at various stages of
completion of the properties hash. So, the above sequence of logic
says, accept the object if there is no species attached (yet), or if
the species doesn't have a taxon ID (yet; in Genbank format, the
taxon ID is actually in the feature table, and hence will only be
populated later, after parsing the organism lines), or if the taxon
ID is 9606. Otherwise (i.e., there is a species object, it has a
taxon ID defined, and the taxon ID is not 9606) reject the object.
(Note that --seqfilter will read and parse a file if the argument
refers to an existing and readable file. So if you are going to use
this construct often, you may want to put into a file.)
-hilmar
On Nov 3, 2006, at 11:47 AM, Seth Johnson wrote:
> Hello guys,
>
> I'm populating biosql database using "load_seqdatabase.pl" from
> genbank release files for primates. However, I only need sequences
> that belong to humans (taxon id: 9606). I assume that best way to
> filter the necessary sequences is to use '-pipeline' option of the
> script. The documentation seems a little vague to me on how to create
> my own processor to accomplish the task. Can anyone clarify the
> steps???
>
> --
> Best Regards,
>
>
> Seth Johnson
> Senior Bioinformatics Associate
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list