create matrix using python

Deadly 提交于 2019-12-25 04:28:09

问题


I have 6 text files (each corresponds to a specific sample) and each file looks like this:

Gene_ID Gene_Name   Strand  Start   End Length  Coverage    FPKM    TPM
ENSMUSG00000102735  Gm7369  +   4610471 4611406 936 0   0   0
ENSMUSG00000025900  Rp1 -   4290846 4409241 10926   0   0   0
ENSMUSG00000104123  Gm37483 -   4363346 4364829 1484    0   0   0
ENSMUSG00000102175  Gm6119  -   4692219 4693424 1206    0.328358    0.015815    0.008621

I want to collect all the elements from 1 & 2 column in one file and corresponding tpm values(9th column) for each sample in a new file, so wherever there is no tpm value enter 0.

My output file should look like this:

gene_id gene_name sample1_tpm sample2_tpm sample3_tpm ......sample6_tpm

回答1:


One way to do this is keep one dictionary that stores sample values for each gene_id.

Initialize dictionary = {}

Iterate through each of the 6 files and do:

for file in [f1,f2,f3..f6]:
   for line in file:
        labels = line.split(" ")
        val = 1 if labels[8] else 0 
     if labels[0] not in dictionary:
        dictionary[labels[0]] = {'name' : labels[1], 'sample' : [val]}            
     else:
        dictionary[labels[0]]['sample'].append(val) 

This will store keys as gene_id and name, sample(list of 6 sample_ids) as values.

You can now write to output file just by iterating through the keys and values.

f = open("output.txt","w+")
f.write("gene_id,gene_name,sample1,sample2,sample3,sample4,sample5,sample6\n")
for key in dictionary.keys():
    samples = ",".join(dictionary[key]['sample'])
    f.write(dictionary[key]+","+dictionary[key]['name']+","+samples+"\n")
f.close()


来源:https://stackoverflow.com/questions/40690081/create-matrix-using-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!