I am attempting to call a python script from a master script. I need the dataframe to be generated only one from within the master script and then passed on to the subproces
Subprocess launches another application. The ways that processes may communicate between each other significantly differ from ways that functions communicate within python program. You need to pass your DataFrame through a non pythonic environment. So you need to serialize it in-to a text and then deserialize it on other end. For example you can use pickle module and then sp.communicate(pickle.dumps(test_dataframe))
on one end end pickle.loads(sys.stdin.read())
on another. Or you can write your DataFrame as csv and then parse it again. Or you can use any other format.
Here is a complete example for Python 3.6 of two-way communication between the master script and a subprocess.
master.py
import pandas as pd
import pickle
import subprocess
df = pd.read_excel(r'C:\test_location\file.xlsx',sheetname='Table')
result = subprocess.run(['python', 'call_model.py'], input=pickle.dumps(df), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
returned_df = pickle.loads(result.stdout)
assert df == returned_df
If there is a problem, you can check result.stderr
.
subroutine.py
import pickle
import sys
data = pickle.loads(sys.stdin.buffer.read())
sys.stdout.buffer.write(pickle.dumps(data))