[Bioperl-l] Problem retrieving sequences from NCBI

Diego Riano diriano at rz.uni-potsdam.de
Tue May 3 03:55:22 EDT 2005


Hello,
I have a small problem.
I have a script to retrieve sequences from ncbi.  If there are specified
coordinates, then the script only retrieves the corresponding region
from the sequence.
The input of the script is a file (IN) with a list of accession numbers
and an optional pair of coordinates (start-end), and the user can
specified the output format (default is fasta).
When a specific region was specified, there is a secondary accesion
number:
REGION: start..end
The problem that I have is that for some sequences I found in the
output, for the secondary ac:
REGION: ?
Any idea why this could happen?
###############################################
    while(my $line=<IN>){
	chomp $line;
	my ($id,$coords)=split(/\t/,$line);
	my $fetch="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=$id&retmode=text";
	if(defined($format) && $format ne ""){
	    $fetch.="&rettype=$format";
	}
	else{
	    $fetch.="&rettype=fasta";
	}
	if (defined($coords)){
	    my ($start,$end)=split(/-/,$coords);
	    $fetch.="&seq_start=$start&seq_stop=$end";
	}
	my $result=get($fetch);
}
#################################################

Thanks

diego
-- 
_______________________________________
Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
Potsdam University
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:+49 331 977 2809
http://www.geocities.com/dmrp.geo/



More information about the Bioperl-l mailing list