[Bioperl-l] Bio::DB::GenPept - get_Seq_by_id
Jason Stajich
jason.stajich at duke.edu
Sat Apr 16 12:12:38 EDT 2005
If you specify -verbose => 1 when initializing an object you will often
see debugging statements.
$sp = Bio::DB::GenPept->new(-verbose => 1);
You will see that the URLs generated to fetch the sequence are proper
(we're not truncating the id or anything). After playing around it
looks like we have to put the ID in quotes if it starts with a number
otherwise the server assumes it is a gi number. I think this might be
an NCBI shortcut?
In the short term you should just do this to your id strings to quote
them if they start with a number.
$id = "\"$id\"" if $id =~ /^\d/;
We'll add code in the modules to detect and fix these automatically --
quoting GI numbers doesn't seem to cause problems so maybe we should
quote every id?
If you are only querying swissprot data you might find
Bio::DB::SwissProt useful as well.We'll add code in the modules to
detect and fix these automatically -- quoting GI numbers doesn't seem
to cause problems so maybe we should quote every id?
Bio::DB::NCBIHelper was updated in CVS to quote the ids before making
the URL for the query.
I put some fixed into CVS which better parse swissprot fields from
DBSOURCE (in Bio/SeIO/genbank) as well although it is always better to
get this from the original swissprot records as there is some munging
in the transfer process.
-jason
On Apr 15, 2005, at 8:24 PM, Jamie Sherman wrote:
> I'm getting really odd behavior when I user get_Seq_by_id to retrieve
> from GenPept. I'm trying to retrieve by name where name is like
> 'ROA1_HUMAN". When I have a name that starts with a Letter it works
> great but for names that start with a number it returns junk. Is there
> a work around for this or am I doing something wrong? Can I create a
> Bio::DB::GenPept->new( arg to specify search type )?
> Thanks,
> --Jamie
>
>
> Program:
> #!/usr/bin/perl -w
>
> use Bio::DB::GenPept;
> $sp = Bio::DB::GenPept->new;
>
> # worked $query = 'AAP1_YEAST';
> # worked $query = "ROA1_HUMAN";
> $query = "2AAA_YEAST"; #doesn't work?
>
> $seq = $sp->get_Seq_by_id($query);
> print $seq->desc . "\n";
> print $seq->primary_id . "\n";
>
>
> Output:
> [AAP1_YEAST]
> Alanine/arginine aminopeptidase.
> 728771
>
> [ROA1_HUMAN]
> Heterogeneous nuclear ribonucleoprotein A1 (Helix-destabilizing
> protein) (Single-strand binding protein) (hnRNP core protein A1).
> 133254
>
> [2AAA_YEAST]
> B.taurus DNA sequence 1 from patent application EP0238993.
> 2
>
> It is using 2 as the ID number, How do I escape this?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list