fastest way to create JSON to reflect a tree structure in Python / Django using mptt

后端未结

关注

 4  1499

别跟我提以往 2021-01-30 14:54

What\'s the fastest way in Python (Django) to create a JSON based upon a Django queryset. Note that parsing it in the template as proposed here is not an option.

The ba

4条回答

醉话见心 (楼主)

2021-01-30 15:06
Your updated version looks like there would be very little overhead. I think it would be slightly more efficient (and more readable, too!) to use a list comprehension:
```
def serializable_object(node):
    "Recurse into tree to build a serializable object"
    obj = {
        'name': node.name,
        'children': [serializable_object(ch) for ch in node.get_children()]
    }
    return obj
```
Besides that, all you can do is profile it to find the bottlenecks. Write some standalone code that loads and serializes your 300 nodes and then run it with
```
python -m profile serialize_benchmark.py
```
(or -m cProfile if that works better).

The can see 3 different potential bottlenecks:
- DB access (.get_children() and .name) -- I'm not sure exactly what's going on under the hood, but I've had code like this that does a DB query for each node adding a tremendous overhead. If that's your problem, you can probably configure this to do an "eager load" using select_related or something similar.
- function call overhead (e.g. serializable_object itself) -- Just make sure ncalls for serializable_object looks like a reasonable number. If I understand your description, it should be in the neighborhood of 300.
- serializing at the end (json.dumps(nodeInstance)) -- Not a likely culprit since you said it's only 300 nodes, but if you do see this taking up a lot of execution time, make sure you have the compiled speedups for JSON working properly.
If you can't tell much from profiling it, make a stripped-down version that, say, recursively calls node.name and node.get_children() but doesn't store the results in a data structure, and see how that compares.

Update: There are 2192 calls to execute_sql in solution 3 and 2192 in solution 5, so I think that excessive DB queries is a problem and that select_related did nothing the way it's used above. Looking at django-mptt issue #88: Allow select_related in model methods suggests that you're using it more-or-less right, but I have my doubt, and get_children vs. get_descendants might make a huge difference.

There's also a ton of time being taken up by copy.deepcopy, which is puzzling because you're not directly calling it, and I don't see it called from the MPTT code. What's tree.py?

If you're doing a lot of work with profiling, I'd highly recommend the really slick tool RunSnakeRun, which lets you see your profile data in a really convenient grid form and make sense of the data more quickly.

Anyway, here's one more attempt at streamlining the DB side of things:
```
import weakref
obj_cache = weakref.WeakValueDictionary()

def serializable_object(node):
    root_obj = {'name': node.get_wbs_code(), 'wbsCode': node.get_wbs_code(),
            'id': node.pk, 'level': node.level, 'position': node.position,
            'children': []}
    obj_cache[node.pk] = root_obj
    # don't know if the following .select_related() does anything...
    for descendant in node.get_descendants().select_related():
        # get_descendants supposedly traverses in "tree order", which I think
        # means the parent obj will always be created already
        parent_obj = obj_cache[descendant.parent.pk]    # hope parent is cached
        descendant_obj = {'name': descendant.get_wbs_code(),
            'wbsCode': descendant.get_wbs_code(), 'id': descendant.pk,
            'level': descendant.level, 'position': descendant.position,
            'children': []}
        parent_obj['children'].append(descendant_obj)
        obj_cache[descendant.pk] = descendant_obj
    return root_obj
```
Note this is no longer recursive. It proceeds iteratively through nodes, theoretically after their parents have been visited, and it's all using one big call to MPTTModel.get_descendants(), so hopefully that's well-optimized and caches .parent, etc. (or maybe there's a more direct way to get at the parent key?). It creates each obj with no children initially, then "grafts" all the values to their parents afterwards.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...