[Bioperl-l] Having problems with parsing SwissProt Records
Anand Venkatraman
bioperlanand at yahoo.com
Wed Oct 27 00:44:47 EDT 2004
Hi,
I am using Bioperl to parse SwissProt Records.
The bioperl version is 1.4.
I am having 2 problems :
Problem 1: I am unable to get all the accession
numbers from the line starting with AC on the
SwissProt Record. i.e.,in some SwissProt records
there are multiple accession numbers whereas in some
there is only 1 Accession Number. My code (see below)
is getting only the 1st accession number it
encounters.
Problem 2: I am also trying to get the associated
EMBL and GO cross-references fro a given Swissprot
entry. The problem I am having is that
[a]: I am only getting the Nucleotide Id and Not the
Protein Id from the EMBL tag and
[b]: In some cases, I am unable to get the GO ids. For
example, from the code below, I am only getting the GO
id for some records, and missing it for some. Also, if
a particular record has 3 or 4 lines of GO, the code
just captures the 1st occurence of the GO Id(if and
when it does so).
This is the code
-------------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $sp_file = shift @ARGV or die$!;
my $seqio_object = Bio::SeqIO->new(-file => $sp_file,
-format => "swiss");
while (my $seq_object = $seqio_object->next_seq) {
if ($seq_object->species->binomial =~ m/Homo
sapiens/) {
print "Accession:
",$seq_object->accession_number(), "\t";
my $annotation = $seq_object->annotation();
foreach my $dblink (
$annotation->get_all_Annotations('dblink') ) {
if ( ( $dblink->database eq "EMBL" ) || (
$dblink->database eq "GO" ) ) {
print "\t",$dblink->database, ":",
$dblink->primary_id, "\t";
}
}
}
print "\n";
}
-------------------------------------------------------
Any suggestions,
Thanks in advance for the help.
Anand
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail
More information about the Bioperl-l
mailing list