Matlab: Removing duplicate interactions [duplicate]

旧巷老猫 提交于 2020-01-05 23:17:06

问题


I have a Protein-Protein interaction data of homo sapiens. The size of the matrix is <4850628x3>. The first two columns are proteins and the third is its confident score. The problem is half the rows are duplicate pairs

if protein A interacts with B, C, D. it is mentioned as

  • A B 0.8
  • A C 0.5
  • A D 0.6
  • B A 0.8
  • C A 0.5
  • D A 0.6

If you observe the confident score of A interacting with B and B interacting with A is 0.8

If I have a matrix of <4850628x3> half the rows are duplicate pairs. If I choose Unique(1,:) I might loose some data.

But I want <2425314x3> i.e without duplicate pairs. How can I do it efficiently?

Thanks Naresh


回答1:


Supposing that in your matrix you store each protein with a unique id.
(Eg: A=1, B=2, C=3...) your example matrix will be:

M =

    1.0000    2.0000    0.8000
    1.0000    3.0000    0.5000
    1.0000    4.0000    0.6000
    2.0000    1.0000    0.8000
    3.0000    1.0000    0.5000
    4.0000    1.0000    0.6000

You must first sort the two first columns row-wise so you will always have the protein pairs in the same order:

M2 = sort(M(:,1:2),2)

M2 =

     1     2
     1     3
     1     4
     1     2
     1     3
     1     4

Then use unique with the second parameter rows and keep the indexes of unique pairs:

[~, idx] = unique(M2, 'rows')

idx =

     1
     2
     3

Finally filter your initial matrix to keep unly the unique pairs.

R = M(idx,:)

R =

    1.0000    2.0000    0.8000
    1.0000    3.0000    0.5000
    1.0000    4.0000    0.6000

Et voilà!



来源:https://stackoverflow.com/questions/34811327/matlab-removing-duplicate-interactions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!