[Bioperl-l] Perl script for sequence matching
Suraj Peri
suraj_peri@yahoo.com
Fri, 14 Jul 2000 23:22:04 -0700 (PDT)
--0-596516649-963642124=:6351
Content-Type: text/plain; charset=us-ascii
Hi all,
Thank you for the solutions. I have a bottle neck finding the same hits.
I mean I have two files
1. which contains hits from the BLAST [blast output]
2. Contains hists which contain sequences accession numbers, zscore etc...
DROSOPHILA:CG3727-FBAN0003727 + 7.41 128.40 548 ! last_updated:000321
having these two files inhand..i would like to pick the same hits [ seq. which are reported both in file 1 and file 2..
I tried this using gawk but i got all the hists in the files and i am sure there are unique hits reported only in either file1 ot in file 2.
It would be a great help if any one can suggest a script..
reading both files and reporting the hits which are found in both 1 and 2 files...
Thank you...
Peri
Biotechnology centre
m.s.univ. of baroda
baroda india.
Paul Gordon <gordonp@niji.imb.nrc.ca> wrote:
Another one-liner, or at least close to it :-)
perl -ne 'BEGIN{$/=">";$"=";"}($d,$_)=/(.*?)\n(.+?)>?$/s;push
@{$h{lc()}},$d if $_;END{for(keys%h){print">@{$h{$_}}$_"}}'
filename1 filename2 ...
________________________________________________________________________
Paul Gordon Paul.Gordon@nrc.ca
Genomic Technologies http://maggie.cbr.nrc.ca
Institute for Marine Biosciences
National Research Council Canada
On Fri, 14 Jul 2000, Suraj Peri wrote:
>
> Hi all,
> I am interested in knowing a script t o report the
> unique sequences from two files. It should not report
> the repetitions.
> I am infact trying to write this but my script is not
> working.
> As i need it fast so can anyone please help . is that
> in BIOPERL.
>
> thanks.
> Peri.
>
> __________________________________________________
> Do You Yahoo!?
> Get Yahoo! Mail – Free email you can access from anywhere!
> http://mail.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l
---------------------------------
Do You Yahoo!?
Get Yahoo! Mail - Free email you can access from anywhere!
--0-596516649-963642124=:6351
Content-Type: text/html; charset=us-ascii
<P>Hi all,</P>
<P>Thank you for the solutions. I have a bottle neck finding the same hits.</P>
<P>I mean I have two files</P>
<P>1. which contains hits from the BLAST [blast output]</P>
<P>2. Contains hists which contain sequences accession numbers, zscore etc...</P>
<P> </P><FONT face="Courier New" size=2>
<P>DROSOPHILA:CG3727-FBAN0003727 + 7.41 128.40 548 ! last_updated:000321</P>
<P> </P>
<P>having these two files inhand..i would like to pick the same hits [ seq. which are reported both in file 1 and file 2..</P>
<P>I tried this using gawk but i got all the hists in the files and i am sure there are unique hits reported only in either file1 ot in file 2.</P>
<P>It would be a great help if any one can suggest a script.. </P>
<P>reading both files and reporting the hits which are found in both 1 and 2 files...</P>
<P>Thank you...</P>
<P>Peri</P>
<P>Biotechnology centre</P>
<P>m.s.univ. of baroda</P>
<P>baroda india.</P></FONT>
<P><BR>
<P> <BR>
<P> <B><I>Paul Gordon <gordonp@niji.imb.nrc.ca></I></B> wrote: <BR>
<BLOCKQUOTE style="BORDER-LEFT: #1010ff solid 2px; MARGIN-LEFT: 5px; PADDING-LEFT: 5px">Another one-liner, or at least close to it :-)<BR><BR>perl -ne 'BEGIN{$/=">";$"=";"}($d,$_)=/(.*?)\n(.+?)>?$/s;push<BR>@{$h{lc()}},$d if $_;END{for(keys%h){print">@{$h{$_}}$_"}}'<BR>filename1 filename2 ...<BR><BR>________________________________________________________________________<BR>Paul Gordon Paul.Gordon@nrc.ca<BR>Genomic Technologies http://maggie.cbr.nrc.ca<BR>Institute for Marine Biosciences<BR>National Research Council Canada<BR><BR>On Fri, 14 Jul 2000, Suraj Peri wrote:<BR><BR>> <BR>> Hi all,<BR>> I am interested in knowing a script t o report the<BR>> unique sequences from two files. It should not report<BR>> the repetitions.<BR>> I am infact trying to write this but my script is not<BR>> working.<BR>> As i need it fast so can anyone please help . is that<BR>> in BIOPERL.<BR>> <BR>> thanks.<BR>> Peri.<BR>> <BR>> __________________________________________________<BR>> Do You Yahoo!?<BR>> Get Yahoo! Mail – Free email you can access from anywhere!<BR>> http://mail.yahoo.com/<BR>> _______________________________________________<BR>> Bioperl-l mailing list<BR>> Bioperl-l@bioperl.org<BR>> http://bioperl.org/mailman/listinfo/bioperl-l<BR>> <BR><BR>_______________________________________________<BR>Bioperl-l mailing list<BR>Bioperl-l@bioperl.org<BR>http://bioperl.org/mailman/listinfo/bioperl-l</BLOCKQUOTE><p><br><hr size=1><b>Do You Yahoo!?</b><br>
Get <a href="http://mail.yahoo.com/">Yahoo! Mail</a> - Free email you can access from anywhere!
--0-596516649-963642124=:6351--