Why does SSIS TOKEN function fail to count adjacent column delimiters?

醉酒当歌 提交于 2019-11-29 11:32:14

Reason for the issue:

TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).

I searched for the implementation of strtok function and I found the following links.

INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.

The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.

Need to know when no data appears between two token separators using strtok()

strtok_s behaviour with consecutive delimiters

I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.

Original Post - Above section is an update:

I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.

Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).

Expression can be rewritten as:

This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.

(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)

I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.

Hope that helps to simplify your expression.

If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.

Create table and populate scripts:

CREATE TABLE [dbo].[SourceTable](
    [OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[DestinationTable](
    [NewImportRecord] [varchar](50) NOT NULL,
    [CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO

INSERT INTO dbo.SourceTable (OldImportRecord) VALUES 
    ('1^Apple^0001^01/01/2010^Anteater^A1'),
    ('2^Banana^0002^03/15/2010^Bear^B2'),
    ('3^Cranberry^0003^4/15/2010^Crow^C3'),
    ('4^^0004^6/15/2010^Duck^D4'),
    ('5^^^^Emu^E5'),
    ('6^^^^Geese^F6'),
    ('^^^^Pheasant^G7'),
    ('8^^^^Sparrow^');
GO

Derived column transformation inside data flow task:

Data in source and destination tables:

Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:

1^Apple^0001^01/01/2010^Anteater^A1

Followed by one with adjacent and leading delimiters like this:

^^^0004^6/15/2010^Duck^

TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.

I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.

(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")

Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!