I\'ve seen a number of people claim that you should specifically name each column you want in your select query.
Assuming I\'m going to use all of the columns anyway
In a roundabout way you are breaking the modularity rule about using strict typing wherever possible. Explicit is almost universally better.
Even if you now need every column in the table, more could be added later which will be pulled down every time you run the query and could hurt performance. It hurts performance because
When you explicitly NEED every column in the table, as opposed to needing every column in the table THAT EXISTED AT THE TIME YOU WROTE THE QUERY. For example, if were writing an DB management app that needed to display the entire contents of the table (whatever they happened to be) you might use that approach.
The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.
When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select *
is an anti-pattern.
So selecting columns is not a premature optimization. A few things off the top of my head ....
The corollary is that using select *
...
Writing correct SQL is just as easy as writing Select *
. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.
I understand where you're going regarding premature optimization, but that really only goes to a point. The intent is to avoid unnecessary optimization in the beginning. Are your tables unindexed? Would you use nvarchar(4000) to store a zip code?
As others have pointed out, there are other positives to specifying each column you intend to use in the query (such as maintainability).
I actually noticed a strange behaviour when I used select *
in views in SQL Server 2005.
Run the following query and you will see what I mean.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','c1'
union all select 'a2','b2','c2'
union all select 'a3','b3','c3'
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vStartest]'))
DROP VIEW [dbo].[vStartest]
go
create view dbo.vStartest as
select * from dbo.starTest
go
go
IF EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'[dbo].[vExplicittest]'))
DROP VIEW [dbo].[vExplicittest]
go
create view dbo.[vExplicittest] as
select a,b,c from dbo.starTest
go
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicitTest
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[starTest]') AND type in (N'U'))
DROP TABLE [dbo].[starTest]
CREATE TABLE [dbo].[starTest](
[id] [int] IDENTITY(1,1) NOT NULL,
[A] [varchar](50) NULL,
[B] [varchar](50) NULL,
[D] [varchar](50) NULL,
[C] [varchar](50) NULL
) ON [PRIMARY]
GO
insert into dbo.starTest
select 'a1','b1','d1','c1'
union all select 'a2','b2','d2','c2'
union all select 'a3','b3','d3','c3'
select a,b,c from dbo.vStartest
select a,b,c from dbo.vExplicittest
Compare the results of last 2 select statements. I believe what you will see is a result of Select * referencing columns by index instead of name.
If you rebuild the view it will work fine again.
EDIT
I have added a separate question, *“select * from table” vs “select colA, colB, etc. from table” interesting behaviour in SQL Server 2005* to look into that behaviour in more details.
You might join two tables and use column A from the second table. If you later add column A to the first table (with same name but possibly different meaning) you'll most likely get the values from the first table and not the second one as earlier. That won't happen if you explicitly specify the columns you want to select.
Of course specifying the columns also sometimes causes bugs if you forget to add the new columns to every select clause. If the new column is not needed every time the query is executed, it may take some time before the bug gets noticed.
When you're specifying columns, you're also tying yourself into a specific set of columns and making yourself less flexible, making Feuerstein roll over in, well, whereever he is. Just a thought.