<div dir="ltr"><div class="" itemprop="text">
<p>I'm new with python so i'm reaaally struggling in making a script.</p>
<p>So, what I need is to make a comparison between two files. One file
contains all proteins of some data base, the other contain only some of
the proteins presents in the other file, because it belongs to a
organism. So I need to know wich proteins of this data base is present
in my organism. For that I want to build a output like a matrix, with 0
and 1 referring to every protein present in the data base that may or
may not be in my organism.</p>
<p>Does anybody have any idea of how could I do that?
I'm trying to use something like this
$ cat sorted.a
A
B
C
D
$ cat sorted.b
A
D
$ join sorted.a sorted.b | sed 's/^/1 /' && join -v 1 sorted.a sorted.b | sed 's/^/0 /'
1 A
1 D
0 B
0 C</p>
<p>But I'm not being able to use it because sometimes a protein is present but its not in the same line.
Here is a example:</p><p>
1-cysPrx_C<br>14-3-3<br>2-Hacid_dh<br>2-Hacid_dh_C<br>2-oxoacid_dh<br>2H-phosphodiest<br>2OG-FeII_Oxy<br>2OG-FeII_Oxy_3<br>2OG-FeII_Oxy_4<br>2OG-FeII_Oxy_5<br>2OG-Fe_Oxy_2<br>2TM<br>2_5_RNA_ligase2</p>
<p>comparing with</p>
<p>1-cysPrx_C<br>120_Rick_ant<br>14-03-2003<br>2-Hacid_dh<br>2-Hacid_dh_C<br>2-oxoacid_dh<br>2-ph_phosp<br>2CSK_N<br>2C_adapt<br>2Fe-2S_Ferredox<br>2H-phosphodiest<br>2HCT<br>2OG-FeII_Oxy<br></p>
<p>Does anyone have an idea of how could I do that?
Thanks so far.</p>
</div></div>