I am having hard time optimizing my SQLAlchemy queries. My SQL knowledge is very basic, and I just can't get the stuff I need from the SQLAlchemy docs.
Suppose the following very basic one-to-many relationship:
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key = True)
children = relationship("Child", backref = "parent")
class Child(Base):
__tablename__ = "children"
id = Column(Integer, primary_key = True)
parent_id = Column(Integer, ForeignKey("parents.id"))
naughty = Column(Boolean)
How could I:
- Query tuples of
(Parent, count_of_naughty_children, count_of_all_children)
for each parent?
After decent time spent googling, I found how to query those values separately:
# The following returns tuples of (Parent, count_of_all_children):
session.query(Parent, func.count(Child.id)).outerjoin(Child, Parent.children).\
group_by(Parent.id)
# The following returns tuples of (Parent, count_of_naughty_children):
al = aliased(Children, session.query(Children).filter_by(naughty = True).\
subquery())
session.query(Parent, func.count(al.id)).outerjoin(al, Parent.children).\
group_by(Parent.id)
I tried to combine them in different ways, but didn't manage to get what I want.
- Query all parents which have more than 80% naughty children? Edit: naughty could be NULL.
I guess this query is going to be based on the previous one, filtering by naughty/all ratio.
Any help is appreciated.
EDIT : Thanks to Antti Haapala's help, I found solution to the second question:
avg = func.avg(func.coalesce(Child.naughty, 0)) # coalesce() treats NULLs as 0
# avg = func.avg(Child.naughty) - if you want to ignore NULLs
session.query(Parent).join(Child, Parent.children).group_by(Parent).\
having(avg > 0.8)
It finds average if children's naughty
variable, treating False and NULLs as 0, and True as 1. Tested with MySQL backend, but should work on others, too.
the count()
sql aggretate function is pretty simple; it gives you the total number of non-null values in each group. With that in mind, we can adjust your query to give you the proper result.
print (Query([
Parent,
func.count(Child.id),
func.count(case(
[((Child.naughty == True), Child.id)], else_=literal_column("NULL"))).label("naughty")])
.join(Parent.children).group_by(Parent)
)
Which produces the following sql:
SELECT
parents.id AS parents_id,
count(children.id) AS count_1,
count(CASE WHEN (children.naughty = 1)
THEN children.id
ELSE NULL END) AS naughty
FROM parents
JOIN children ON parents.id = children.parent_id
GROUP BY parents.id
If your query is only to get the parents who have > 80 % children naughty, you can on most databases cast the naughty
to integer, then take average of it; then having
this average greater than 0.8
.
Thus you get something like
from sqlalchemy.sql.expression import cast
naughtyp = func.avg(cast(Child.naughty, Integer))
session.query(Parent, func.count(Child.id), naughtyp).join(Child)\
.group_by(Parent.id).having(naughtyp > 0.8).all()
来源:https://stackoverflow.com/questions/24916532/sqlalchemy-several-counts-in-one-query