How can I compute a Count Morgan fingerprint as numpy.array?

蓝咒 提交于 2019-12-11 09:30:00

问题


I would like to use rdkit to generate count Morgan fingerprints and feed them to a scikit Learn model (in Python). However, I don't know how to generate the fingerprint as a numpy array. When I use

from rdkit import Chem
from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetMorganFingerprint(m, 2, useCounts=True)

I get a UIntSparseIntVect that I would need to convert. The only thing I found was cDataStructs (see: http://rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html), but this does not currently support UIntSparseIntVect.


回答1:


Maybe a little late to answer but these methods work for me

If you want the bits (0 and 1):

from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs

mol = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024)
array = np.zeros((0, ), dtype=np.int8)
DataStructs.ConvertToNumpyArray(fp, array)

And back to a fingerprint:

bitstring = "".join(array.astype(str))
fp2 = DataStructs.cDataStructs.CreateFromBitString(bitstring)
assert list(fp.GetOnBits()) == list(fp2.GetOnBits())

If you want the counts:

fp3 = AllChem.GetHashedMorganFingerprint(mol, 2, nBits=1024)
array = np.zeros((0,), dtype=np.int8)
DataStructs.ConvertToNumpyArray(fp3, array)
print(array.nonzero())

Output:

(array([ 19,  33,  64, 131, 175, 179, 356, 378, 428, 448, 698, 707, 726,
   842, 849, 889]),)

And back to a fingerprint (Not sure this is the best way to do this):

def numpy_2_fp(array):
    fp = DataStructs.cDataStructs.UIntSparseIntVect(len(array))
    for ix, value in enumerate(array):
        fp[ix] = int(value)
    return fp

fp4 = numpy_2_fp(array)
assert fp3.GetNonzeroElements() == fp4.GetNonzeroElements()



回答2:


from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetHashedMorganFingerprint(m, 2, nBits=1024)
fp_dict = fp.GetNonZeroElements()
arr = np.zeros((1024,))
for key, val in fp_dict.items():
    arr[key] = val

It seems there is no direct way to get a numpy array so I build it from the dictionary.



来源:https://stackoverflow.com/questions/54809506/how-can-i-compute-a-count-morgan-fingerprint-as-numpy-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!