[Bioperl-l] Get variation included in genbank file

Dave Messina David.Messina at sbc.su.se
Thu Jun 10 20:11:27 UTC 2010


Nice, Chris!

I've added it to the EUtils cookbook.

Dave



On Jun 10, 2010, at 2:06 AM, Chris Fields wrote:

> It's much easier to work with the GI than the accession.  NCBI unfortunately just recently 'broke' their acc->gi stuff via efetch; you have to use rettype='seqid' and munge ASN.1 to get everything (though it is nice in a way for ID mapping).  
> 
> After the initial step of grabbing the GI for NG_011506, though, you can use elink to grab the SNP IDs, then use efetch to get the actual SNP files, or esummary for the summary info.
> 
> #!/usr/bin/perl -w
> 
> use Modern::Perl;
> use Bio::DB::EUtilities;
> 
> my $id = '224809339';
> 
> my $eutil = Bio::DB::EUtilities->new(-eutil => 'elink',
>                                     -id    => $id,
>                                     -email  => 'setyourown at foo.bar',
>                                     -verbose   => 1,
>                                     -dbfrom => 'nuccore',
>                                     -db  => 'snp',
>                                     -cmd   => 'neighbor_history',
> );
> 
> my $hist = $eutil->next_History || die "No history data returned";
> 
> $eutil->set_parameters(-eutil => 'efetch',
>                       -history   => $hist,
>                       -retmode => 'text',
>                       # 'chr', 'flt', 'brief', 'rsr', 'docset'
>                       -rettype => 'chr'  
> );
> 
> $eutil->get_Response(-file => 'snps.txt');
> 
> # or ...
> 
> $eutil->set_parameters(-eutil => 'esummary',
>                       -history   => $hist,
> );
> 
> $eutil->print_all;
> 
> # chris
> 
> On Jun 9, 2010, at 1:37 PM, Jessica Sun wrote:
> 
>> Thanks Dave.
>> the variation information is not present in the version of NG_011506 I found
>> at Genbank.) -- Yes, then if you click on the right side customer view there
>> is a check box Features added by NCBI :209 snps, if you check that it will
>> add all the variations in the gbk fomat. I found this would be a neat
>> feature if it can automatically load by bioperl with an option turn on.
>> 
>> 
>> 
>> On Wed, Jun 9, 2010 at 1:51 PM, Dave Messina <David.Messina at sbc.su.se>wrote:
>> 
>>> Hi Jessica,
>>> 
>>> Please keep the BioPerl list on the Cc line so everyone can follow along.
>>> 
>>> 
>>>> Follow your approach it did not seem to me you can have Variation tag
>>> included which
>>>> list the know dbSNP location, id and allele changes?
>>> 
>>> Ah okay, I assumed the file you attached was obtained directly from Genbank
>>> and that the variation info therein was already included. (It appears that's
>>> not the case — the variation information is not present in the version of
>>> NG_011506 I found at Genbank.)
>>> 
>>> If you want to include your own custom information in a genbank file,
>>> you'll have to pull it out of dbSNP (or wherever the variation info is).
>>> There are a couple of scripts that might be able to help with that (search
>>> for snp):
>>> 
>>>      http://www.bioperl.org/wiki/Bioperl_scripts
>>> 
>>> 
>>> You can then insert them into a RichSeq object as features and output in
>>> genbank format. For that part, see the HOWTO:
>>> 
>>>      http://www.bioperl.org/wiki/HOWTO:Feature-Annotation
>>> 
>>> 
>>> Dave
>>> 
>>> 
>> 
>> 
>> -- 
>> Jessica Jingping Sun
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list