[Bioperl-l] [How to add features in genbank flat file]
Jason Stajich
jason.stajich at duke.edu
Thu Mar 24 20:51:28 EST 2005
You seem annoyed that no one solved the problem for you - I hope that
you realize that if you want a specific feature you can also modify the
module yourself and provide a patch to the project.
As for the specifics of your problem perhaps if you highlight what the
entrez key-value sets need to be set to in order to get the SNP data we
can add it to the GenBank::Query as an option.
Removing the blank lines is part of the SeqIO parsing but I suppose a
state variable could be added in genbank.pm to not skip them when in
the 'COMMENT' state if this is a critical feature for you.
If you are just downloading genbank files it looks like you have a good
solution so I'm glad you were able to figure it out.
-jason
> Hello,
> No one seems to have a solution to this problem I posted a month ago.
>
> So, I changed my mind and use 'wget' to get the GenBank sequences.
> I get the full GenBank entry, with most of features.
> And I can avoid another bug: COMMENT lines are not well formated with
> the BioPerl script I used (not as COMMENT lines are on NCBI), and
> blank lines are removed.
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use diagnostics;
> use File::Cat;
>
> my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is
> missing.\n\tTry something like: $0 NM_178432\n\n";
>
> `wget -O output_file.tmp
> "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendt
> o=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef
> _HPRD=32" 2>/dev/null`;
>
> cat ("output_file.tmp", \*STDOUT);
> unlink("output_file.tmp");
>
> # wget -O output_file
> 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send&
> sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=
> 16&ef_HPRD=32'
>
> exit;
>
>
> Sorry, I don't use BioPerl to Query GenBank (but for other
> applications) but BioPerl 1.5 has not corrected the COMMENT bug and
> the missing features.
>
>> Hello,
>> I saw that Genbank web site have changed:
>> Now, features like 'SNPs' are no more included in the EST flat files.
>> At the NCBI web site, we must click on 'features: SNP' to add them in
>> our flat file.
>> With BioPerl, 1.4 or 1.5, it's the same, the variation features are
>> no more included in the EST flat files that I upload.
>> Here is the script I use:
>> #!/usr/bin/perl -w
>>
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::Query::GenBank;
>> use Bio::SeqIO;
>> my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is
>> missing.
>> \n\tTry something like: $0 NM_178432\n\n";
>>
>> $acc=$acc."[Accession]";
>>
>> my $query_string = "$acc";
>> my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
>>
>> -query=>$query_string);
>>
>> my $gb = new Bio::DB::GenBank;
>> my $stream = $gb->get_Stream_by_query($query);
>>
>> my $out=Bio::SeqIO->new(-format=>'genbank');
>> my $seq = $stream->next_seq();
>>
>> my $result=$out->write_seq($seq);
>> $result =~ s/^1.*$//;
>> #print $out->write_seq($seq);
>> print $result;
>>
>> exit;
>> How can I add most of features to my nucleotide flat files ?
>> Thanks
>
> --
> Sébastien Moretti
> http://igs.cnrs-mrs.fr/
> CNRS - IGS
> 31 chemin Joseph Aiguier
> 13402 Marseille cedex
>
More information about the Bioperl-l
mailing list