Is there a way to retain the SqlAlchemy attribute names when you query the data into a pandas dataframe?
Here\'s a simple mapping of my database. For the school table,
I am not a SQLAlchemy expert by any means, but I have come up with a more generalized solution (or at least a start).
Caveats
<tablename/model name>.<mapper column name>
.It involves four key steps:
<table name>_<column name>
:df = pd.read_sql(query.statement, query.session.bind).with_labels()
table_name, col = col_name.split('_', 1)
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tname:
return c
for k, v in sa_class.__mapper__.columns.items():
if v.name == col:
return k
Bringing it all together, this is the solution I have come up with, with the main caveat being it will result in duplicate column names in your dataframe if you (likely) have duplicate mapped names across classes.
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
school = relationship("School", backref='studentscore')
def mapped_col_name(col_name):
''' Retrieves mapped Model based on
actual table name (as given in pandas.read_sql)
'''
def sa_class(table_name):
for c in Base._decl_class_registry.values():
if hasattr(c, '__tablename__') and c.__tablename__ == tname:
return c
table_name, col = col_name.split('_', 1)
sa_class = sa_class(table_name)
for k, v in sa_class.__mapper__.columns.items():
if v.name == col:
return k
query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind).with_labels()
df.columns = map(mapped_col_name, df.columns)
This is the kind of solution I would bitterly complain about if I had to maintain the code afterwards. But your question has so many constraints that I cannot find anything better.
First you construct a dictionary with the equivalences of schema and class columns using introspection like this (I'm using the first example you've posted):
In [132]:
def add_to_dict(c_map, t_map, table):
name = table.__tablename__
t_map[name] = table.__name__
#print name
c_map[name] = {}
for column in dir(table):
c_schema_name = table.__mapper__.columns.get(column)
if isinstance(c_schema_name, Column):
#print column, c_schema_name.name
c_map[name][c_schema_name.name] = column
c_map = {}
t_map = {}
add_to_dict(c_map, t_map, School)
add_to_dict(c_map, t_map, StudentScore)
print c_map['DimSchool']['SchoolKey']
print c_map['FactStudentScore']['SchoolKey']
print t_map['DimSchool']
id
SchoolKey
School
[EDIT: clarifications on the way to build the dictionary with introspection
sqlalchemy
mapperColumn
object only if the attribute is really a columnColumn
objects, add them to the column names dictionary. The database name is obtained with .name
and the other is just the attribute Run this just once after creating all the objects in the database, calling it once per table class.]
Then you take your sql statement and build up a list of the translation of the columns you are going to get:
In [134]:
df_columns = []
for column in str(query.statement).split('FROM')[0].split('SELECT')[1].split(','):
table = column.split('.')[0].replace('"', '').strip()
c_schema = column.split('.')[1].replace('"', '').strip()
df_columns += [t_map[table] + '.' + eq[table][c_schema]]
print df_columns
['StudentScore.SchoolKey', 'StudentScore.PointsPossible', 'StudentScore.PointsReceived', 'School.id', 'School.name', 'School.district']
Finally, you read the dataframe as in your question and change the names of the columns:
In [137]:
df.columns = df_columns
In [138]:
df
Out[138]:
StudentScore.SchoolKey StudentScore.PointsPossible StudentScore.PointsReceived School.id School.name School.district
0 1 1 None 1 School1 None
(The data is just a silly register I've created).
Hope it helps!