Is this a good or bad way of generating random numbers for each record?

风流意气都作罢 提交于 2019-12-25 07:26:34

问题


A colleague of mine discovered a behaviour in SQL Server which I was unaware of.

CREATE VIEW dbo.vRandNumber AS
SELECT RAND() as RandNumber
GO

CREATE FUNCTION dbo.RandNumber() RETURNS float AS
RETURN (SELECT RandNumber FROM vRandNumber)
GO

DECLARE @mytable TABLE (id INT)
INSERT INTO @mytable SELECT 1
INSERT INTO @mytable SELECT 2
INSERT INTO @mytable SELECT 3

SELECT *, dbo.RandNumber() FROM @mytable

This seems to be the quickest way of generating a 'random' value for each record in a data set. But I'm not completely sure if it's a result of documented behaviour, or taking advantage of a bizarre convergance of coincidences.

Would you use something like this?


EDIT

This isn't a question about the merits of the RAND() function itself, but the use of the UDF/VIEW combination to force it to recalculate on every row. (Using just RAND() in the final query, instead of dbo.RandNumber(), would give the same value for every record.)

Also, the point is for the value to be different every time you look at it. So enabling random selection of records, for example.

EDIT

For SQL Server 2000+.


回答1:


I would not do this for a piece of software I wanted to continue working on future versions of SQL Server. I found a way to return a different values from RAND() for each row in a select statement. This discovery was 1) a bit of a hack and 2) was made on SQL Server 2005. It no longer works on SQL Server 2008. That experience makes me extra leary of relying on trickery to get rand() to return a random value per row.

Also, I believe SQL Server is allowed to optimize away the multiple calls to a UDF ... though that might be changing since they do allow some non-deterministic functions now.

For SQL Server 2005 only, a way to force rand() to execute per row in a select statement. Does not work on SQL Server 2008. Not tested on any version prior to 2005:

create table #t (i int)
insert into #t values (1)
insert into #t values (2)
insert into #t values (3)

select i, case when i = 1 then rand() else rand() end as r
from #t

1   0.84923391682467
2   0.0482397143838935
3   0.939738172108974

Also, I know you said you were not asking about the randomness of rand(), but I will a good reference is: http://msdn.microsoft.com/en-us/library/aa175776(SQL.80).aspx. It compares rand() to newid() and rand(FunctionOf(PK, current datetime)).




回答2:


It depends on what you need the random value for. It also depends on the format that you need the value in INTEGER, VARCHAR, etc.

if I need to sort rows randomly, I do something like

SELECT *
FROM [MyTable]
ORDER BY newID()

Likewise, you could generate a table of ints using the identity "feature" of SQL Server and perform a similar query and that could give you a random number.

My colleague needed a random integer per row, so he added a calculated field to our table and that generates one random number (integer) per row returned in a query. I'm not sure I recommend this; it caused issues in certain tools but it gave random integers for each table. We could then combine my solution of newid() and that table and get a set of random numbers when needed.

So I return to it depends. Can you elaborate on what you need it?

Update: Here is the table definition snippet my colleague used to have a computed column return a different random number per row, each time the table is queried:

CREATE TABLE [dbo].[Table](
    -- ...
    [OrderID] [smallint] NOT NULL,  --Not sure what happens if this is null
    -- ...
    [RandomizeID]  AS (convert(int,(1000 * rand(([OrderID] * 100 * datepart(millisecond,getdate())))))),
    -- ...
)



回答3:


If I had to select a random number for each row in SQL, and you could prove to me that RAND() is generating true random numbers...

Yes. I would probably use something like that.




回答4:


I wouldn't use this. As far as I know, RAND() uses the system time as seed and produces the same values when executed more than once quickly after each other. For example, try this:

SELECT    *, 
          RAND()
FROM      SomeTable

RAND() will give you the same value for each row.




回答5:


The view and udf approach is clumsy for me: excess trivial objects to use a flawed function.

I'd use CHECKSUM(NEWID()) to generate a random number (rather than RAND() * xxx), or the new SQL Server 2008 CRYPT_GEN_RANDOM



来源:https://stackoverflow.com/questions/1433915/is-this-a-good-or-bad-way-of-generating-random-numbers-for-each-record

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!