Why is SELECT * considered harmful?

后端 未结 15 1444
野的像风
野的像风 2020-11-21 04:12

Why is SELECT * bad practice? Wouldn\'t it mean less code to change if you added a new column you wanted?

I understand that SELECT COUNT(*)

相关标签:
15条回答
  • 2020-11-21 04:53

    Understand your requirements prior to designing the schema (if possible).

    Learn about the data, 1)indexing 2)type of storage used, 3)vendor engine or features; ie...caching, in-memory capabilities 4)datatypes 5)size of table 6)frequency of query 7)related workloads if the resource is shared 8)Test

    A) Requirements will vary. If the hardware can not support the expected workload, you should re-evaluate how to provide the requirements in the workload. Regarding the addition column to the table. If the database supports views, you can create an indexed(?) view of the specific data with the specific named columns (vs. select '*'). Periodically review your data and schema to ensure you never run into the "Garbage-in" -> "Garbage-out" syndrome.

    Assuming there is no other solution; you can take the following into account. There are always multiple solutions to a problem.

    1) Indexing: The select * will execute a tablescan. Depending on various factors, this may involve a disk seek and/or contention with other queries. If the table is multi-purpose, ensure all queries are performant and execute below you're target times. If there is a large amount of data, and your network or other resource isn't tuned; you need to take this into account. The database is a shared environment.

    2) type of storage. Ie: if you're using SSD's, disk, or memory. I/O times and the load on the system/cpu will vary.

    3) Can the DBA tune the database/tables for higher performance? Assumming for whatever reason, the teams have decided the select '*' is the best solution to the problem; can the DB or table be loaded into memory. (Or other method...maybe the response was designed to respond with a 2-3 second delay? --- while an advertisement plays to earn the company revenue...)

    4) Start at the baseline. Understand your data types, and how results will be presented. Smaller datatypes, number of fields reduces the amount of data returned in the result set. This leaves resources available for other system needs. The system resources are usually have a limit; 'always' work below these limits to ensure stability, and predictable behaviour.

    5) size of table/data. select '*' is common with tiny tables. They typically fit in memory, and response times are quick. Again....review your requirements. Plan for feature creep; always plan for the current and possible future needs.

    6) Frequency of query / queries. Be aware of other workloads on the system. If this query fires off every second, and the table is tiny. The result set can be designed to stay in cache/memory. However, if the query is a frequent batch process with Gigabytes/Terabytes of data...you may be better off to dedicate additional resources to ensure other workloads aren't affected.

    7) Related workloads. Understand how the resources are used. Is the network/system/database/table/application dedicated, or shared? Who are the stakeholders? Is this for production, development, or QA? Is this a temporary "quick fix". Have you tested the scenario? You'll be surprised how many problems can exist on current hardware today. (Yes, performance is fast...but the design/performance is still degraded.) Does the system need to performance 10K queries per second vs. 5-10 queries per second. Is the database server dedicated, or do other applications, monitoring execute on the shared resource. Some applications/languages; O/S's will consume 100% of the memory causing various symptoms/problems.

    8) Test: Test out your theories, and understand as much as you can about. Your select '*' issue may be a big deal, or it may be something you don't even need to worry about.

    0 讨论(0)
  • 2020-11-21 04:54

    Generally you have to fit the results of your SELECT * ... into data structures of various types. Without specifying which order the results are arriving in, it can be tricky to line everything up properly (and more obscure fields are much easier to miss).

    This way you can add fields to your tables (even in the middle of them) for various reasons without breaking sql access code all over the application.

    0 讨论(0)
  • 2020-11-21 04:59

    Reference taken from this article.

    Never go with "SELECT *",

    I have found only one reason to use "SELECT *"

    If you have special requirements and created dynamic environment when add or delete column automatically handle by application code. In this special case you don’t require to change application and database code and this will automatically affect on production environment. In this case you can use “SELECT *”.

    0 讨论(0)
  • 2020-11-21 05:00

    If you name the columns in a SELECT statement, they will be returned in the order specified, and may thus safely be referenced by numerical index. If you use "SELECT *", you may end up receiving the columns in arbitrary sequence, and thus can only safely use the columns by name. Unless you know in advance what you'll be wanting to do with any new column that gets added to the database, the most probable correct action is to ignore it. If you're going to be ignoring any new columns that get added to the database, there is no benefit whatsoever to retrieving them.

    0 讨论(0)
  • 2020-11-21 05:00

    Selecting with column name raises the probability that database engine can access the data from indexes rather than querying the table data.

    SELECT * exposes your system to unexpected performance and functionality changes in the case when your database schema changes because you are going to get any new columns added to the table, even though, your code is not prepared to use or present that new data.

    0 讨论(0)
  • 2020-11-21 05:04

    I don't think that there can really be a blanket rule for this. In many cases, I have avoided SELECT *, but I have also worked with data frameworks where SELECT * was very beneficial.

    As with all things, there are benefits and costs. I think that part of the benefit vs. cost equation is just how much control you have over the datastructures. In cases where the SELECT * worked well, the data structures were tightly controlled (it was retail software), so there wasn't much risk that someone was going to sneek a huge BLOB field into a table.

    0 讨论(0)
提交回复
热议问题