[Biopython] matching sequences from fasta files
Vincent Davis
vincent at vincentdavis.net
Wed Mar 10 18:10:20 UTC 2010
@Leighton
"If I never needed to do this again, I would probably run BLAST or FASTA (or
my favourite search algorithm, running ungapped) using one set of sequences
as a query, and the other as the target database, using the program
parameters to report only one match each time. I'd then use Python to
parse the results, throwing away all those matches where"
I don't have a favorite, I have only tried BLAST :)
Is there an example of how to interface between python and BLAST. I have no
idea where to start. I have never done anything similar.
@ Leighton
I think I will take your approach. Thanks for the input.
*Vincent Davis
720-301-3003 *
vincent at vincentdavis.net
my blog <http://vincentdavis.net> |
LinkedIn<http://www.linkedin.com/in/vincentdavis>
On Wed, Mar 10, 2010 at 8:53 AM, Leighton Pritchard <lpritc at scri.ac.uk>wrote:
> Hi,
>
> On 10/03/2010 Wednesday, March 10, 03:46, "Vincent Davis"
> <vincent at vincentdavis.net> wrote:
>
> > I need to check if any/all the sequence from one fasta file are in
> another.
> > Looking through the docs I think I could do this.
>
> As others have pointed out, a simple string comparison will do this.
>
> > I then what to find "close matches" and for me this means they differ by
> 1
> > snp and I need to know the location of this differing snp. How would I do
> > this?
>
> There are many ways in which this *could* be done. You probably want one
> that is quite quick, though <grin>
>
> If I never needed to do this again, I would probably run BLAST or FASTA (or
> my favourite search algorithm, running ungapped) using one set of sequences
> as a query, and the other as the target database, using the program
> parameters to report only one match each time. I'd then use Python to
> parse the results, throwing away all those matches where
>
> i) if the number of aligned bases is the same as the number of bases in the
> query: the number of match identities differs from the number of aligned
> bases by more than one
> ii) if the number of aligned bases differs from the number of bases in the
> query by exactly one: the number of match identities differs from the
> number
> of aligned bases
> iii) the number of aligned bases differs from the number of bases in the
> query by two or more
>
> The remainder should be your set of (almost) full-length 1/0 SNP matches,
> and there should be enough data in your search program output to identify
> the location of the SNP.
>
> I think it would be faster to use something off-the-shelf like BLAST and
> parse the output, than to write something to do the search. It will
> probably work quicker, too.
>
> Lots of ways to do this repeatably, including writing a generator function.
>
> I hope this is useful,
>
> L.
>
> --
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk <e%3Alpritc at scri.ac.uk> w:
> http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
>
>
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.
> The Scottish Crop Research Institute is a charitable company limited by
> guarantee.
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
>
>
> DISCLAIMER:
>
> This email is from the Scottish Crop Research Institute, but the views
> expressed by the sender are not necessarily the views of SCRI and its
> subsidiaries. This email and any files transmitted with it are confidential
> to the intended recipient at the e-mail address to which it has been
> addressed. It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this
> confidentiality and you must not use, disclose, copy, print or rely on this
> e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of
> the sender and delete the email from your system.
>
> Although SCRI has taken reasonable precautions to ensure no viruses are
> present in this email, neither the Institute nor the sender accepts any
> responsibility for any viruses, and it is your responsibility to scan the
> email and the attachments (if any).
> ______________________________________________________
>
More information about the Biopython
mailing list