问题
I know similar questions have been asked in the past, but they still haven't given me a proper solution for my case.
I have a database table (third party) that has a varchar
column for a datetime
value.
It contains dates in the following formats.
11181980
8 18 1960
10/01/1960
04-12-1953
041371
7/29/44
Empty String
NULL
When I select this column, I want to bring the date in a standard format (say mm/dd/yyyy
) when available or NULL.
I can only think of a function to do this, but I don't want to do a UDF as I need to make sure it does not error out while trying to convert. There is no try/catch in UDF. I could do a CLR function to make use of more powerful .net features though I would like to avoid it.
Is there any other better way to handle this conversion in SQL Server? Also how should I go about doing this conversion if possible in SQL.
回答1:
For the set of potential formats you've described:
DECLARE @x TABLE(y VARCHAR(32))
INSERT @x VALUES
('11181980'),
('8 18 1960'),
('10/01/1960'),
('04-12-1953'),
('041371'),
('7/29/44'),
(''),
(NULL);
SET DATEFORMAT MDY;
SELECT CONVERT(DATETIME, CASE WHEN y LIKE '%/%' THEN y
WHEN LEN(RTRIM(y)) = 0 THEN NULL
WHEN LEN(RTRIM(y)) IN (6,8) AND ISNUMERIC(y) = 1 THEN
STUFF(STUFF(y,3,0,'/'),6,0,'/') END)
FROM (SELECT y = REPLACE(REPLACE(y, ' ', '/'), '-', '/') FROM @x) AS x;
This will interpret 7/29/44
as 2044
, not 1944
, based on server settings. To make sure all dates are in the past, you could do:
SELECT y = DATEADD(YEAR, CASE WHEN y > GETDATE() THEN -100 ELSE 0 END, y)
FROM
(
SELECT y = CONVERT(DATETIME, CASE WHEN y LIKE '%/%' THEN y
WHEN LEN(RTRIM(y)) = 0 THEN NULL ELSE
STUFF(STUFF(y, 3, 0, '/'),6, 0, '/') END)
FROM (SELECT y = REPLACE(REPLACE(y, ' ', '/'), '-', '/') FROM @x) AS x
) AS z;
This also depends on there being no garbage data that can't be massaged into a date. What kind of system enters this kind of inconsistent nonsense anyway?
In SQL Server 2012 you will be able to use TRY_PARSE or TRY_CONVERT but with that mess of formats you're still going to have to do some massaging to get meaningful results.
回答2:
If you own the database but cannot change it, I would run a stored procedure that sanitizes all values to ONE common format and make sure that only entries in that format can be inserted/updated. If you can't control the CRUD operations, I would just fetch the "dates" as is and perform the conversion to a DateTime
in your BL layer.
Perhaps not an answer to your question but personally I like all queries as simple as possible by keeping conversion and other logic outside the database.
回答3:
I'd suggest you do the following:
- Find whoever designed that table and shoot them
- Write a CLR function to parse the value to a date, probably using regex pattern matching
- Create a view that returns all of the same columns, but your function result instead of the varchar field
To be honest that data looks like rubbish, I doubt you could rely on it at all. It's possible there are values such as:
- 11190
- 1111990
Should these be 1990-11-01 or 1990-01-11? I think the CLR function will get you the most data in the most stable manner.
回答4:
Here is my solution, to this 3-year old question. I did not have any spaces in mine, but you can use this as the basis and use the replace function to strip those out when you evaluate. Here you go, Internet. Thanks for all the help the last 10 years. This is rather specific to SQL data import export but will hopefully help someone who is otherwise stuck in manual ETL mode.
CASE WHEN DOB LIKE '__/__/____' THEN [DOB] -- PROPER FORMAT
WHEN DOB LIKE '_/__/____' THEN '0'+ [DOB] -- NEED TO ADD A ZERO TO THE MONTH
WHEN DOB LIKE '__/_/____' THEN LEFT(DOB,3)+'0'+RIGHT(DOB,6) -- NEED TO ADD A ZERO TO THE DAY
WHEN DOB LIKE '_/_/____' THEN '0'+LEFT(DOB,2)+'0'+RIGHT(DOB,6) -- NEED TO ADD A ZERO TO THE MONTH AND DAY
WHEN LEN(DOB)=8 AND DOB BETWEEN '1900' AND '2016' THEN LEFT(RIGHT(DOB,4),2) + '/' + RIGHT(DOB,2) +'/'+ LEFT(DOB,4)
WHEN LEN(DOB)=8 AND DOB BETWEEN '01011900' AND '12312016' AND DOB NOT LIKE '%/%' THEN LEFT(DOB,2) + '/' + RIGHT(LEFT(DOB,4),2) +'/'+ RIGHT(DOB,4)
WHEN DOB LIKE '__/__/__' -- CONVERT FROM MM/DD/YY (ADD TWO DIGIT YEAR PREFIX)
THEN CASE WHEN RIGHT(replace(dob,'/',''),2) > RIGHT(YEAR(GETDATE()),2) --WHEN 2-DIGIT YEAR IS WITHIN 100 YEARS AGO USE 19
THEN LEFT(DOB,2)+'/'+LEFT(RIGHT(replace(dob,'/',''),4),2)+'/19'+RIGHT(replace(dob,'/',''),2)
WHEN RIGHT(DOB,2) < RIGHT(YEAR(GETDATE()),2) --WHEN 2-DIGIT YEAR IS MORE THAN 100 YEARS AGO USE 20
THEN LEFT(DOB,2)+'/'+LEFT(RIGHT(replace(dob,'/',''),4),2)+'/20'+RIGHT(replace(dob,'/',''),2)
ELSE NULL END
ELSE NULL END AS [DOB_CONVERTER]
As Max Vernon points out, you must account for every pattern to match and fix. Error handling would be great to get this automated. Until then look over the data once it is cleaned, load to a temp table and look for baddies using similar like patterns (WHERE NULL to find non pattern matching) (WHERE RIGHT(LEFT(REPLACE([DOB],'/',''),4),2) > 31
Pattern searches was a helpful site at microsoft.com https://technet.microsoft.com/en-us/library/ms187489(v=sql.105).aspx
来源:https://stackoverflow.com/questions/12200367/standardizing-differently-formatted-varchar-field-to-date-in-sql-server