Should I COUNT(*) or not?

前端未结

关注

 14  1367

I know it\'s generally a bad idea to do queries like this:

SELECT * FROM `group_relations`

But when I just want the count, should I go for this

相关标签:

14条回答

遇见更好的自我

2021-01-30 16:24

An asterisk in COUNT has no bearing with asterisk for selecting all fields of table. It's pure rubbish to say that COUNT(*) is slower than COUNT(field)

I intuit that select COUNT(*) is faster than select COUNT(field). If the RDBMS detected that you specify "*" on COUNT instead of field, it doesn't need to evaluate anything to increment count. Whereas if you specify field on COUNT, the RDBMS will always evaluate if your field is null or not to count it.

But if your field is nullable, specify the field in COUNT.

0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2021-01-30 16:24

if you try SELECT COUNT(1) FROM group_relations it will be a bit faster because it will not try to retrieve information from your columns.

COUNT(1) used to be faster than COUNT(*), but that's not true anymore, since modern DBMS are smart enough to know that you don't wanna know about columns

0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2021-01-30 16:30
If the column in question is NOT NULL, both of your queries are equivalent. When group_id contains null values,
```
select count(*)
```
will count all rows, whereas
```
select count(group_id)
```
will only count the rows where group_id is not null.

Also, some database systems, like MySQL employ an optimization when you ask for count(*) which makes such queries a bit faster than the specific one.

Personally, when just counting, I'm doing count(*) to be on the safe side with the nulls.
0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2021-01-30 16:33

if you try SELECT COUNT(1) FROMgroup_relations it will be a bit faster because it will not try to retrieve information from your columns.

Edit: I just did some research and found out that this only happens in some db. In sqlserver it's the same to use 1 or *, but on oracle it's faster to use 1.

http://social.msdn.microsoft.com/forums/en-US/transactsql/thread/9367c580-087a-4fc1-bf88-91a51a4ee018/

Apparently there is no difference between them in mysql, like sqlserver the parser appears to change the query to select(1). Sorry if I mislead you in some way.

0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2021-01-30 16:34
I was curious about this myself. It's all fine to read documentation and theoretical answers, but I like to balance those with empirical evidence.

I have a MySQL table (InnoDB) that has 5,607,997 records in it. The table is in my own private sandbox, so I know the contents are static and nobody else is using the server. I think this effectively removes all outside affects on performance. I have a table with an auto_increment Primary Key field (Id) that I know will never be null that I will use for my where clause test (WHERE Id IS NOT NULL).

The only other possible glitch I see in running tests is the cache. The first time a query is run will always be slower than subsequent queries that use the same indexes. I'll refer to that below as the cache Seeding call. Just to mix it up a little I ran it with a where clause I know will always evaluate to true regardless of any data (TRUE = TRUE).

That said here are my results:

QueryType
```
      |  w/o WHERE          | where id is not null |  where true=true
```
COUNT()
```
      |  9 min 30.13 sec ++ | 6 min 16.68 sec ++   | 2 min 21.80 sec ++
      |  6 min 13.34 sec    | 1 min 36.02 sec      | 2 min 0.11 sec 
      |  6 min 10.06 se     | 1 min 33.47 sec      | 1 min 50.54 sec
```
COUNT(Id)
```
      |  5 min 59.87 sec    | 1 min 34.47 sec      | 2 min 3.96 sec 
      |  5 min 44.95 sec    | 1 min 13.09 sec      | 2 min 6.48 sec
```
COUNT(1)
```
      | 6 min 49.64 sec    | 2 min 0.80 sec       | 2 min 11.64 sec
      | 6 min 31.64 sec    | 1 min 41.19 sec      | 1 min 43.51 sec
```
++This is considered the cache Seeding call. It is expected to be slower than the rest.

I'd say the results speak for themselves. COUNT(Id) usually edges out the others. Adding a Where clause dramatically decreases the access time even if it's a clause you know will evaluate to true. The sweet spot appears to be COUNT(Id)... WHERE Id IS NOT NULL.

I would love to see other peoples' results, perhaps with smaller tables or with where clauses against different fields than the field you're counting. I'm sure there are other variations I haven't taken into account.
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2021-01-30 16:34

MySQL ISAM tables should have optimisation for COUNT(*), skipping full table scan.

0 讨论(0)
发布评论:

提交评论
- 加载中...