I have a DataFrame that originates from a df.groupby().size()
operation, and looks like this:
Localization RNA level
Here is the complete example based on pandas groupby, sum functions.
The basic idea is to group data based on 'Localization'
and to apply a function on group.
import pandas as pd
from io import StringIO
#For Python 2, replace previous line with: from StringIO import StringIO
data = \
"""Localization,RNA level,Size
cytoplasm ,1 Non-expressed, 7
cytoplasm ,2 Very low ,13
cytoplasm ,3 Low , 8
cytoplasm ,4 Medium , 6
cytoplasm ,5 Moderate , 8
cytoplasm ,6 High , 2
cytoplasm ,7 Very high , 6
cytoplasm & nucleus ,1 Non-expressed, 5
cytoplasm & nucleus ,2 Very low , 8
cytoplasm & nucleus ,3 Low , 2
cytoplasm & nucleus ,4 Medium ,10
cytoplasm & nucleus ,5 Moderate ,16
cytoplasm & nucleus ,6 High , 6
cytoplasm & nucleus ,7 Very high , 5
cytoplasm & nucleus & plasma membrane,1 Non-expressed, 6
cytoplasm & nucleus & plasma membrane,2 Very low , 3
cytoplasm & nucleus & plasma membrane,3 Low , 3
cytoplasm & nucleus & plasma membrane,4 Medium , 7
cytoplasm & nucleus & plasma membrane,5 Moderate , 8
cytoplasm & nucleus & plasma membrane,6 High , 4
cytoplasm & nucleus & plasma membrane,7 Very high , 1"""
# Create the dataframe
df = pd.read_csv(StringIO(data))
df['Localization'].str.strip()
df['RNA level'].str.strip()
df['Size'].astype(int)
df['Percent'] = df.groupby('Localization')['Size'].transform(lambda x: x/sum(x))