fastest way to create JSON to reflect a tree structure in Python / Django using mptt

后端 未结 4 1499
别跟我提以往
别跟我提以往 2021-01-30 14:54

What\'s the fastest way in Python (Django) to create a JSON based upon a Django queryset. Note that parsing it in the template as proposed here is not an option.

The ba

4条回答
  •  醉话见心
    2021-01-30 15:06

    Your updated version looks like there would be very little overhead. I think it would be slightly more efficient (and more readable, too!) to use a list comprehension:

    def serializable_object(node):
        "Recurse into tree to build a serializable object"
        obj = {
            'name': node.name,
            'children': [serializable_object(ch) for ch in node.get_children()]
        }
        return obj
    

    Besides that, all you can do is profile it to find the bottlenecks. Write some standalone code that loads and serializes your 300 nodes and then run it with

    python -m profile serialize_benchmark.py
    

    (or -m cProfile if that works better).

    The can see 3 different potential bottlenecks:

    • DB access (.get_children() and .name) -- I'm not sure exactly what's going on under the hood, but I've had code like this that does a DB query for each node adding a tremendous overhead. If that's your problem, you can probably configure this to do an "eager load" using select_related or something similar.
    • function call overhead (e.g. serializable_object itself) -- Just make sure ncalls for serializable_object looks like a reasonable number. If I understand your description, it should be in the neighborhood of 300.
    • serializing at the end (json.dumps(nodeInstance)) -- Not a likely culprit since you said it's only 300 nodes, but if you do see this taking up a lot of execution time, make sure you have the compiled speedups for JSON working properly.

    If you can't tell much from profiling it, make a stripped-down version that, say, recursively calls node.name and node.get_children() but doesn't store the results in a data structure, and see how that compares.


    Update: There are 2192 calls to execute_sql in solution 3 and 2192 in solution 5, so I think that excessive DB queries is a problem and that select_related did nothing the way it's used above. Looking at django-mptt issue #88: Allow select_related in model methods suggests that you're using it more-or-less right, but I have my doubt, and get_children vs. get_descendants might make a huge difference.

    There's also a ton of time being taken up by copy.deepcopy, which is puzzling because you're not directly calling it, and I don't see it called from the MPTT code. What's tree.py?

    If you're doing a lot of work with profiling, I'd highly recommend the really slick tool RunSnakeRun, which lets you see your profile data in a really convenient grid form and make sense of the data more quickly.

    Anyway, here's one more attempt at streamlining the DB side of things:

    import weakref
    obj_cache = weakref.WeakValueDictionary()
    
    def serializable_object(node):
        root_obj = {'name': node.get_wbs_code(), 'wbsCode': node.get_wbs_code(),
                'id': node.pk, 'level': node.level, 'position': node.position,
                'children': []}
        obj_cache[node.pk] = root_obj
        # don't know if the following .select_related() does anything...
        for descendant in node.get_descendants().select_related():
            # get_descendants supposedly traverses in "tree order", which I think
            # means the parent obj will always be created already
            parent_obj = obj_cache[descendant.parent.pk]    # hope parent is cached
            descendant_obj = {'name': descendant.get_wbs_code(),
                'wbsCode': descendant.get_wbs_code(), 'id': descendant.pk,
                'level': descendant.level, 'position': descendant.position,
                'children': []}
            parent_obj['children'].append(descendant_obj)
            obj_cache[descendant.pk] = descendant_obj
        return root_obj
    

    Note this is no longer recursive. It proceeds iteratively through nodes, theoretically after their parents have been visited, and it's all using one big call to MPTTModel.get_descendants(), so hopefully that's well-optimized and caches .parent, etc. (or maybe there's a more direct way to get at the parent key?). It creates each obj with no children initially, then "grafts" all the values to their parents afterwards.

提交回复
热议问题