[Bioperl-l] Uniprot/Swiss accessions?

bill at genenformics.com bill at genenformics.com
Tue May 19 00:11:51 UTC 2009


The problem is that makeblastdb does not recognize the first block of
deflines:

I changed the defline from:
>UniRef50_P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

to
>sp|P0C9F1 Protein MGF 100-1R n=5 Tax=African swine fever virus
RepID=1001R_ASFM2

and it works!

It seems that prefixing your protein id with 'sp|' right after '>' will work.

Good luck!

Bill at genenformics

> As far as I can see, none of the fasta at
> ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/
> will correctly formatdb with the "-o T" option. This is with the latest
> version of blast (2.2.20 [Feb-08-2009])
> If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above
> link, they successfully create the required files but the blast result
> descriptions are truncated.
> NCBI say it's not their fault and EBI don't answer their email.
>
> A quick hack of prepending fake GI numbers to each accession gets the
> files formatted correctly and allows sequence retrieval but it's not an
> ideal solution.
>
>
> --Russell
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
>> Sent: Tuesday, 19 May 2009 9:39 a.m.
>> To: "Cook, Malcolm "@tagc.univ-mrs.fr; "
>> <mec at stowers.org>"@tagc.univ-mrs.fr
>> Cc: 'BioPerl List'
>> Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
>>
>> May be you try the PICR service at EBI
>> http://www.ebi.ac.uk/Tools/picr/
>> or some other ID converter (as for example some Gene Ontology tools) or
>> even SRS.
>>
>> I think there could be more than one gi per sp (it's not clear to me if
>> you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
>>
>> Answer us your solution.
>>
>> Regards,
>> Samuel
>>
>> > If you need to retain mapping between acc => gi it gets a little more
>> > complicated; most procedures to NCBI return a 'bag' of gi's w/o any
>> > relation to their original accession.  You can grab them via esummary,
>> > though, but you'll have to iterate through them.
>> >
>> > The other option is LiveLists (has both nuc and protein acc => gi).
>> > I'm assuming this would have the swissprot accessions included (famous
>> > last words):
>> >
>> > ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
>> >
>> > chris
>> >
>> >
>> >
>> > On May 18, 2009, at 9:34 AM, Cook, Malcolm wrote:
>> >
>> >> you could:
>> >>
>> >> 1) Use eutils search with -database protein -term "srcdb swiss
>> >> prot"[Properties]
>> >>  If you use a retmax of 100000 it should only take a few seconds to
>> >> download the 458,445 ginumbers.
>> >>  I just did it.
>> >>
>> >> 2) use fastacmd to extract the fasta from nr for these gis, and
>> >> parse the defline.
>> >>  (assuming you have a copy of nr)
>> >>
>> >>
>> >> Does this work for you?
>> >>
>> >>
>> >> Malcolm Cook
>> >> Stowers Institute for Medical Research - Kansas City, Missouri
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: bioperl-l-bounces at lists.open-bio.org
>> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> >>> Smithies, Russell
>> >>> Sent: Sunday, May 17, 2009 11:53 PM
>> >>> To: 'BioPerl List'
>> >>> Subject: [Bioperl-l] Uniprot/Swiss accessions?
>> >>>
>> >>> Does anyone know of a way to get GI numbers for
>> >>> Uniprot/Swissprot accessions?
>> >>>
>> >>> Fasta from Uniprot's FTP site doesn't formatdb correctly
>> >>> (with the -o T option) as it's missing the gi number in the
>> >>> fasta header.
>> >>> NCBI won't let you use SwissProt ids in batch-entrez and I
>> >>> don't want to have to look up all 466,739 of them.
>> >>> I could use Bio::DB::Eutilities and query each id but even at
>> >>> 10 queries/second (the limit changed recently) it would take too
>> >>> long.
>> >>>
>> >>> Any ideas?
>> >>> Is there a swissprot2gi list somewhere?
>> >>>
>> >>> Thanx,
>> >>>
>> >>>
>> >>> Russell Smithies
>> >>>
>> >>> Bioinformatics Applications Developer
>> >>> T +64 3 489 9085
>> >>> E  russell.smithies at agresearch.co.nz
>> >>>
>> >>> Invermay  Research Centre
>> >>> Puddle Alley,
>> >>> Mosgiel,
>> >>> New Zealand
>> >>> T  +64 3 489 3809
>> >>> F  +64 3 489 9174
>> >>> www.agresearch.co.nz
>> >>>
>> >>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>





More information about the Bioperl-l mailing list