Renaming columns when querying with SQLAlchemy into Pandas DataFrame

前端 未结 2 1477
一个人的身影
一个人的身影 2021-02-15 18:04

Is there a way to retain the SqlAlchemy attribute names when you query the data into a pandas dataframe?

Here\'s a simple mapping of my database. For the school table,

2条回答
  •  一生所求
    2021-02-15 18:46

    I am not a SQLAlchemy expert by any means, but I have come up with a more generalized solution (or at least a start).

    Caveats

    • Will not handle mapped columns with the same name across different Models. You should deal with this by adding suffix, or you could modify my answer below to create pandas columns as ..

    It involves four key steps:

    1. Qualify your query statement with labels, which will result in column names in pandas of _:
      df = pd.read_sql(query.statement, query.session.bind).with_labels()
      
      1. Separate table name from (actual) column name
      table_name, col = col_name.split('_', 1)
      
      1. Get the Model based on tablename (from this question's answers)
      for c in Base._decl_class_registry.values():
                  if hasattr(c, '__tablename__') and c.__tablename__ == tname:
                      return c
      
      1. Find the correct mapped name
      for k, v in sa_class.__mapper__.columns.items():
              if v.name == col:
                  return k
      

      Bringing it all together, this is the solution I have come up with, with the main caveat being it will result in duplicate column names in your dataframe if you (likely) have duplicate mapped names across classes.

      from sqlalchemy import Column, Integer, String
      from sqlalchemy.ext.declarative import declarative_base
      
      Base = declarative_base()
      
      class School(Base):
          __tablename__ = 'DimSchool'
      
          id = Column('SchoolKey', Integer, primary_key=True)
          name = Column('SchoolName', String)
          district = Column('SchoolDistrict', String)
      
      
      class StudentScore(Base):
          __tablename__ = 'FactStudentScore'
      
          SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
          PointsPossible = Column('PointsPossible', Integer)
          PointsReceived = Column('PointsReceived', Integer)
      
          school = relationship("School", backref='studentscore')
      
      
      def mapped_col_name(col_name):
          ''' Retrieves mapped Model based on
          actual table name (as given in pandas.read_sql)
          '''
      
          def sa_class(table_name):
              for c in Base._decl_class_registry.values():
                  if hasattr(c, '__tablename__') and c.__tablename__ == tname:
                      return c
      
          table_name, col = col_name.split('_', 1)
          sa_class = sa_class(table_name)
      
          for k, v in sa_class.__mapper__.columns.items():
              if v.name == col:
                  return k
      
      query = session.query(StudentScore, School).join(School)
      df = pd.read_sql(query.statement, query.session.bind).with_labels()
      df.columns = map(mapped_col_name, df.columns)
      

      提交回复
      热议问题