standardizing differently formatted varchar field to date in sql server

半世苍凉 提交于 2021-02-10 06:14:41

问题


I know similar questions have been asked in the past, but they still haven't given me a proper solution for my case.

I have a database table (third party) that has a varchar column for a datetime value.

It contains dates in the following formats.

  11181980 
  8 18 1960 
  10/01/1960 
  04-12-1953 
  041371 
  7/29/44
  Empty String 
  NULL

When I select this column, I want to bring the date in a standard format (say mm/dd/yyyy) when available or NULL.

I can only think of a function to do this, but I don't want to do a UDF as I need to make sure it does not error out while trying to convert. There is no try/catch in UDF. I could do a CLR function to make use of more powerful .net features though I would like to avoid it.

Is there any other better way to handle this conversion in SQL Server? Also how should I go about doing this conversion if possible in SQL.


回答1:


For the set of potential formats you've described:

DECLARE @x TABLE(y VARCHAR(32))

INSERT @x VALUES
('11181980'),
('8 18 1960'),
('10/01/1960'),
('04-12-1953'), 
('041371'),
('7/29/44'),
(''), 
(NULL);

SET DATEFORMAT MDY;

SELECT CONVERT(DATETIME, CASE WHEN y LIKE '%/%' THEN y
 WHEN LEN(RTRIM(y)) = 0 THEN NULL
 WHEN LEN(RTRIM(y)) IN (6,8) AND ISNUMERIC(y) = 1 THEN
 STUFF(STUFF(y,3,0,'/'),6,0,'/') END)
FROM (SELECT y = REPLACE(REPLACE(y, ' ', '/'), '-', '/') FROM @x) AS x;

This will interpret 7/29/44 as 2044, not 1944, based on server settings. To make sure all dates are in the past, you could do:

SELECT y = DATEADD(YEAR, CASE WHEN y > GETDATE() THEN -100 ELSE 0 END, y) 
FROM
(
  SELECT y = CONVERT(DATETIME, CASE WHEN y LIKE '%/%' THEN y
   WHEN LEN(RTRIM(y)) = 0 THEN NULL ELSE
   STUFF(STUFF(y, 3, 0, '/'),6, 0, '/') END)
  FROM (SELECT y = REPLACE(REPLACE(y, ' ', '/'), '-', '/') FROM @x) AS x
) AS z;

This also depends on there being no garbage data that can't be massaged into a date. What kind of system enters this kind of inconsistent nonsense anyway?

In SQL Server 2012 you will be able to use TRY_PARSE or TRY_CONVERT but with that mess of formats you're still going to have to do some massaging to get meaningful results.




回答2:


If you own the database but cannot change it, I would run a stored procedure that sanitizes all values to ONE common format and make sure that only entries in that format can be inserted/updated. If you can't control the CRUD operations, I would just fetch the "dates" as is and perform the conversion to a DateTime in your BL layer.

Perhaps not an answer to your question but personally I like all queries as simple as possible by keeping conversion and other logic outside the database.




回答3:


I'd suggest you do the following:

  1. Find whoever designed that table and shoot them
  2. Write a CLR function to parse the value to a date, probably using regex pattern matching
  3. Create a view that returns all of the same columns, but your function result instead of the varchar field

To be honest that data looks like rubbish, I doubt you could rely on it at all. It's possible there are values such as:

  • 11190
  • 1111990

Should these be 1990-11-01 or 1990-01-11? I think the CLR function will get you the most data in the most stable manner.




回答4:


Here is my solution, to this 3-year old question. I did not have any spaces in mine, but you can use this as the basis and use the replace function to strip those out when you evaluate. Here you go, Internet. Thanks for all the help the last 10 years. This is rather specific to SQL data import export but will hopefully help someone who is otherwise stuck in manual ETL mode.

    CASE    WHEN DOB LIKE '__/__/____'  THEN [DOB]                              -- PROPER FORMAT
    WHEN DOB LIKE '_/__/____'   THEN '0'+ [DOB]                         -- NEED TO ADD A ZERO TO THE MONTH
    WHEN DOB LIKE '__/_/____'   THEN  LEFT(DOB,3)+'0'+RIGHT(DOB,6)      -- NEED TO ADD A ZERO TO THE DAY
    WHEN DOB LIKE '_/_/____'    THEN '0'+LEFT(DOB,2)+'0'+RIGHT(DOB,6)   -- NEED TO ADD A ZERO TO THE MONTH AND DAY
    WHEN LEN(DOB)=8 AND DOB BETWEEN '1900' AND '2016' THEN LEFT(RIGHT(DOB,4),2) + '/' + RIGHT(DOB,2) +'/'+ LEFT(DOB,4)
    WHEN LEN(DOB)=8 AND DOB BETWEEN '01011900' AND '12312016' AND DOB NOT LIKE '%/%' THEN LEFT(DOB,2) + '/' + RIGHT(LEFT(DOB,4),2) +'/'+ RIGHT(DOB,4)
    WHEN DOB LIKE '__/__/__'     -- CONVERT FROM MM/DD/YY (ADD TWO DIGIT YEAR PREFIX)
        THEN    CASE    WHEN RIGHT(replace(dob,'/',''),2) > RIGHT(YEAR(GETDATE()),2) --WHEN 2-DIGIT YEAR IS WITHIN 100 YEARS AGO USE 19
                            THEN LEFT(DOB,2)+'/'+LEFT(RIGHT(replace(dob,'/',''),4),2)+'/19'+RIGHT(replace(dob,'/',''),2) 
                        WHEN RIGHT(DOB,2) < RIGHT(YEAR(GETDATE()),2) --WHEN 2-DIGIT YEAR IS MORE THAN 100 YEARS AGO USE 20
                            THEN LEFT(DOB,2)+'/'+LEFT(RIGHT(replace(dob,'/',''),4),2)+'/20'+RIGHT(replace(dob,'/',''),2) 
                ELSE NULL END 

ELSE NULL END AS [DOB_CONVERTER]

As Max Vernon points out, you must account for every pattern to match and fix. Error handling would be great to get this automated. Until then look over the data once it is cleaned, load to a temp table and look for baddies using similar like patterns (WHERE NULL to find non pattern matching) (WHERE RIGHT(LEFT(REPLACE([DOB],'/',''),4),2) > 31

Pattern searches was a helpful site at microsoft.com https://technet.microsoft.com/en-us/library/ms187489(v=sql.105).aspx



来源:https://stackoverflow.com/questions/12200367/standardizing-differently-formatted-varchar-field-to-date-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!