[Bioperl-l] parsing protein accession numbers and types from>fasta headers
Chris Fields
cjfields at uiuc.edu
Wed Sep 13 14:36:33 UTC 2006
I agree that the non-BioPerl way is probably best, though you can look at
the Flat Database HOWTO for a fast Bioperl-ish way to index a FASTA file,
get the IDs, set primary and secondary accessions, retrieve sequences, etc.
http://www.bioperl.org/wiki/HOWTO:Flat_databases
Bio::DB::Fasta is also a flat-db interface for accessing large FASTA
databases which users seem to like. It's now capable of handling files >
4GB.
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bernd Web
> Sent: Wednesday, September 13, 2006 8:18 AM
> To: Antonio Ramos Fernández
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] parsing protein accession numbers and types
> from>fasta headers
>
> Hi
>
> I tried to parse this variabilty and get out the dbs. So first I read
> the DB type in $1 and then I got out the ID I needed for my purposes.
> Of course not *Bio*Perl, but it worked for me ;-)
>
> if ( m/>gi\|\d+\|(\w+)\|([^\|\s]*)\|(\S*)\s/ ) {
> my $name;
> #if ($1 eq 'pdb') { $name = $2.$3 } elsif ($1 eq 'sp' || $1 eq
> 'pir')
> { $name = $3 } else { $name = $2 }
> SWITCH: {
> if ($1 eq 'pdb') { $name = $2.$3; last SWITCH; }
> if ($1 eq 'sp' ) { $name = $3; last SWITCH; }
> if ($1 eq 'pir') { $name = $3; last SWITCH; }
> $name = $2;
> }
>
> bernd
>
>
> On 9/13/06, Antonio Ramos Fernández <tniram at hotmail.com> wrote:
> >
> > I'd like to write a script to parse fasta headers of fasta-formatted
> protein
> > databases and get protein accession numbers and identifiers (uniprot,
> IPI,
> > gi, Refseq, ensembl...). The idea is building a simple local database
> that
> > relates an accession number for protein sequence with all valid
> identifiers
> > and the fasta files from where they weher obtained at my system, or
> > checking, for instance, if an uniprot accession exists for a given gi.
> > However, the structure of the fasta header is quite variable depending
> on
> > the source. Any suggestions?
> >
> > _________________________________________________________________
> > Horóscopo, tarot, numerología... Escucha lo que te dicen los astros.
> > http://astrocentro.msn.es/
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list