Django (?) really slow with large datasets after doing some python profiling

前端 未结 4 724
小蘑菇
小蘑菇 2021-02-10 02:48

I was comparing an old PHP script of mine versus the newer, fancier Django version and the PHP one, with full spitting out of HTML and all was functioning faster. MUCH faster to

相关标签:
4条回答
  • 2021-02-10 02:59

    There is a lot of things to assume about your problem as you don't have any type of code sample.

    Here are my assumptions: You are using Django's built-in ORM tools and models (i.e. sales-data = modelobj.objects().all() ) and on the PHP side you are dealing with direct SQL queries and working with a query_set.

    Django is doing a lot of type converting and casting to datatypes going from a database query into the ORM/Model object and the associated manager (objects() by default).

    In PHP you are controlling the conversions and know exactly how to cast from one data type to another, you are saving some execution time based on that issue alone.

    I would recommend trying to move some of that fancy number work into the database, especially if you are doing record-set based processing - databases eat that kind of processing from breakfast. In Django you can send RAW SQL over to the database: http://docs.djangoproject.com/en/dev/topics/db/sql/#topics-db-sql

    I hope this at least can get you pointed in the right direction...

    0 讨论(0)
  • 2021-02-10 02:59

    When dealing with large sets of data, you can also save a lot of CPU and memory by using the ValuesQuerySet that accesses the query results more directly instead of creating a model object instance for each row in the result.

    It's usage looks a bit like this:

    Blog.objects.order_by('id').values()
    
    0 讨论(0)
  • 2021-02-10 03:08

    "tokenize.py comes out on top, which can make some sense as I am doing a lot of number formatting. "

    Makes no sense at all.

    See http://docs.python.org/library/tokenize.html.

    The tokenize module provides a lexical scanner for Python source code, implemented in Python

    Tokenize coming out on top means that you have dynamic code parsing going on.

    AFAIK (doing a search on the Django repository) Django does not use tokenize. So that leaves your program doing some kind of dynamic code instantiation. Or, you're only profiling the first time your program is loaded, parsed and run, leading to false assumptions about where the time is going.

    You should not ever do calculation in template tags -- it's slow. It involves a complex meta-evaluation of the template tag. You should do all calculations in the view in simple, low-overhead Python. Use the templates for presentation only.

    Also, if you're constantly doing queries, filters, sums, and what-not, you have a data warehouse. Get a book on data warehouse design, and follow the data warehouse design patterns.

    You must have a central fact table, surrounded by dimension tables. This is very, very efficient.

    Sums, group bys, etc., are can be done as defaultdict operations in Python. Bulk fetch all the rows, building the dictionary with the desired results. If this is too slow, then you have to use data warehousing techniques of saving persistent sums and groups separate from your fine-grained facts. Often this involves stepping outside the Django ORM and using RDBMS features like views or tables of derived data.

    0 讨论(0)
  • 2021-02-10 03:22

    In such a scenario the database is often the bottleneck. Also, using an ORM might result in sub-optimal SQL queries.

    As some pointed out it's not possible to tell what the probem really is, just with the information you provided.

    I just can give you some general advice:

    • If your view is working with related model objects, consider using select_related(). This simple method might speed up the queries generated by the ORM considerably.
    • Use the Debug Footer Middleware to see what SQL queries are generated by your views and what time they took to execute.

    PS: Just fyi, I had once a fairly simple view which was very slow. After installing the Debug Footer Middleware I saw that around 500! sql queries were executed in that single view. Just using select_related() brought that down to 5 queries and the view performed as expected.

    0 讨论(0)
提交回复
热议问题