Why is SELECT * considered harmful?

后端 未结 15 1525
野的像风
野的像风 2020-11-21 04:12

Why is SELECT * bad practice? Wouldn\'t it mean less code to change if you added a new column you wanted?

I understand that SELECT COUNT(*)

15条回答
  •  误落风尘
    2020-11-21 04:53

    Understand your requirements prior to designing the schema (if possible).

    Learn about the data, 1)indexing 2)type of storage used, 3)vendor engine or features; ie...caching, in-memory capabilities 4)datatypes 5)size of table 6)frequency of query 7)related workloads if the resource is shared 8)Test

    A) Requirements will vary. If the hardware can not support the expected workload, you should re-evaluate how to provide the requirements in the workload. Regarding the addition column to the table. If the database supports views, you can create an indexed(?) view of the specific data with the specific named columns (vs. select '*'). Periodically review your data and schema to ensure you never run into the "Garbage-in" -> "Garbage-out" syndrome.

    Assuming there is no other solution; you can take the following into account. There are always multiple solutions to a problem.

    1) Indexing: The select * will execute a tablescan. Depending on various factors, this may involve a disk seek and/or contention with other queries. If the table is multi-purpose, ensure all queries are performant and execute below you're target times. If there is a large amount of data, and your network or other resource isn't tuned; you need to take this into account. The database is a shared environment.

    2) type of storage. Ie: if you're using SSD's, disk, or memory. I/O times and the load on the system/cpu will vary.

    3) Can the DBA tune the database/tables for higher performance? Assumming for whatever reason, the teams have decided the select '*' is the best solution to the problem; can the DB or table be loaded into memory. (Or other method...maybe the response was designed to respond with a 2-3 second delay? --- while an advertisement plays to earn the company revenue...)

    4) Start at the baseline. Understand your data types, and how results will be presented. Smaller datatypes, number of fields reduces the amount of data returned in the result set. This leaves resources available for other system needs. The system resources are usually have a limit; 'always' work below these limits to ensure stability, and predictable behaviour.

    5) size of table/data. select '*' is common with tiny tables. They typically fit in memory, and response times are quick. Again....review your requirements. Plan for feature creep; always plan for the current and possible future needs.

    6) Frequency of query / queries. Be aware of other workloads on the system. If this query fires off every second, and the table is tiny. The result set can be designed to stay in cache/memory. However, if the query is a frequent batch process with Gigabytes/Terabytes of data...you may be better off to dedicate additional resources to ensure other workloads aren't affected.

    7) Related workloads. Understand how the resources are used. Is the network/system/database/table/application dedicated, or shared? Who are the stakeholders? Is this for production, development, or QA? Is this a temporary "quick fix". Have you tested the scenario? You'll be surprised how many problems can exist on current hardware today. (Yes, performance is fast...but the design/performance is still degraded.) Does the system need to performance 10K queries per second vs. 5-10 queries per second. Is the database server dedicated, or do other applications, monitoring execute on the shared resource. Some applications/languages; O/S's will consume 100% of the memory causing various symptoms/problems.

    8) Test: Test out your theories, and understand as much as you can about. Your select '*' issue may be a big deal, or it may be something you don't even need to worry about.

提交回复
热议问题