[Bioperl-l] Problem retrieving sequences from NCBI
Diego Riano
diriano at rz.uni-potsdam.de
Tue May 3 03:55:22 EDT 2005
Hello,
I have a small problem.
I have a script to retrieve sequences from ncbi. If there are specified
coordinates, then the script only retrieves the corresponding region
from the sequence.
The input of the script is a file (IN) with a list of accession numbers
and an optional pair of coordinates (start-end), and the user can
specified the output format (default is fasta).
When a specific region was specified, there is a secondary accesion
number:
REGION: start..end
The problem that I have is that for some sequences I found in the
output, for the secondary ac:
REGION: ?
Any idea why this could happen?
###############################################
while(my $line=<IN>){
chomp $line;
my ($id,$coords)=split(/\t/,$line);
my $fetch="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=$id&retmode=text";
if(defined($format) && $format ne ""){
$fetch.="&rettype=$format";
}
else{
$fetch.="&rettype=fasta";
}
if (defined($coords)){
my ($start,$end)=split(/-/,$coords);
$fetch.="&seq_start=$start&seq_stop=$end";
}
my $result=get($fetch);
}
#################################################
Thanks
diego
--
_______________________________________
Diego Mauricio Riano Pachon
Biologist
Institute of Biology and Biochemistry
Potsdam University
Karl-Liebknecht-Str. 24-25
Haus 20
14476 Golm
Germany
Tel:+49 331 977 2809
http://www.geocities.com/dmrp.geo/
More information about the Bioperl-l
mailing list