[Bioperl-l] Reply to Hilmar Lapp's solution -- parsing SwissProt Records

Jason Stajich jason at cgt.duhs.duke.edu
Wed Oct 27 15:24:39 EDT 2004


The protein ID is stored in $dblink->optional_id

This is the code which does  the parsing work in Bio::SeqIO::swiss to 
make a DBlink Xref.
elsif (/^DR\s+(\S+)\;\s+(\S+)\;\s+([^;]+)[\;\.](.*)$/) {
            my $dblinkobj =  Bio::Annotation::DBLink->new();
            $dblinkobj->database($1);
            $dblinkobj->primary_id($2);
            $dblinkobj->optional_id($3);
            my $comment = $4;
            if(length($comment) > 0) {
                # edit comment to get rid of leading space and trailing 
dot
                if( $comment =~ /^\s*(\S+)\./ ) {
                    $dblinkobj->comment($1);
                } else {
                    $dblinkobj->comment($comment);
                }
            }
            $annotation->add_Annotation('dblink',$dblinkobj);
        }

-jason
On Oct 27, 2004, at 11:47 AM, Anand Venkatraman wrote:

> Hi,
>
> Thanks a lot for the response.
>
> Some clarifications from my side:
>
> [1] Yes, by the EMBL tag, I catually meant the DbXREFto EMBL for the 
> specific SwissProt accession number. Sorry for the confusion.   Lets 
> say we have this line from a SwsisProt record:
>
> DR   EMBL; X57346; CAA40621.1; -.
>
> By the method outlined in my code, I am able to pull up only the EMBL  
> nucleotide accession number (X57346) , but I am unable to get to the 
> Protein Accession Number (CAA40621.1).
>
> [2] Problems with GO cross-references:
>
> I can send you a small portion of the SwissProt file -- do you want me 
> to send it as an attachment or within the text of the message. Can we 
> send file attachments to the mailing list?
>
>
> Thanks a lot.
>
> Anand
>
> Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Tuesday, October 26, 2004, at 09:44 PM, Anand Venkatraman wrote:
>
>> Hi,
>>
>> I am using Bioperl to parse SwissProt Records.
>>
>> The bioperl version is 1.4.
>>
>> I am having 2 problems :
>>
>> Problem 1: I am unable to get all the accession
>> numbers from the line starting with AC on the
>> SwissProt Record.
>
> Other accessions than the first are available via
> $seq->get_secondary_accessions().
>
>>
>> Problem 2: I am also trying to get the associated
>> EMBL and GO cross-references fro a given Swissprot
>> entry. The problem I am having is that
>> [a]: I am only getting the Nucleotide Id and Not the
>> Protein Id from the EMBL tag and
>
> What do you mean by EMBL tag? Dbxrefs to EMBL?
>
>> [b]: In some cases, I am unable to get the GO ids.
>
> This should not happen. Can you send the accession numbers for those
> sequences, or better yet, the swissprot-formatted file with those (or a
> selection thereof) that fail?
>
> -hilmar
>
>
>> For
>> example, from the code below, I am only getting the GO
>> id for some records, and missing it for some. Also, if
>> a particular record has 3 or 4 lines of GO, the code
>> just captures the 1st occurence of the GO Id(if and
>> when it does so).
>>
>>
>>
>> This is the code
>> -------------------------------------------------------
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::SeqIO;
>>
>> my $sp_file = shift @ARGV or die$!;
>> my $seqio_object = Bio::SeqIO->new(-file => $sp_file,
>> -format => "swiss");
>>
>> while (my $seq_object = $seqio_object->next_seq) {
>> if ($seq_object->species->binomial =~ m/Homo
>> sapiens/) {
>> print "Accession:
>> ",$seq_object->accession_number(), "\t";
>> my $annotation = $seq_object->annotation();
>>
>> foreach my $dblink (
>> $annotation->get_all_Annotations('dblink') ) {
>>
>> if ( ( $dblink->database eq "EMBL" ) || (
>> $dblink->database eq "GO" ) ) {
>> print "\t",$dblink->database, ":",
>> $dblink->primary_id, "\t";
>> }
>> }
>> }
>> print "\n";
>>
>> }
>>
>> -------------------------------------------------------
>>
>> Any suggestions,
>>
>> Thanks in advance for the help.
>>
>> Anand
>>
>>
>>
>>
>> __________________________________
>> Do you Yahoo!?
>> Yahoo! Mail - You care about security. So do we.
>> http://promotions.yahoo.com/new_mail
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
> 		
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Mail Address AutoComplete - You start. We 
> finish._______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu



More information about the Bioperl-l mailing list