Flat file destination columns data types validation

后端 未结 2 1480
無奈伤痛
無奈伤痛 2021-01-14 17:39

A source database field of type INT is read through an OLE DB Source. It is eventually written to a Flat File Destination. The destination Flat File Connection Manager > Adv

相关标签:
2条回答
  • 2021-01-14 18:33

    After re-reading the question to make sure it matched my proof-edits, I realized that it doesn't appear that I answered your question - sorry about that. I have left the first answer in case it is helpful.

    SSIS does not appear to enforce destination metadata; however, it will enforce source metadata. I created a test file with ranges -127 to 400. I tested this with the following scenarios:

    • Test 1: Source and destination flat file connection managers with signed 1 byte data type.
    • Result 1: Failed
    • Test 2: Source is 4 byte signed and destination is 1 byte signed.
    • Result 2: Pass

    SSIS's pipeline metadata validation only cares about the metadata of the input matching the width of the pipeline. It appears to not care what the output is. Though, it offers you the ability to set the destination to whatever the downstream source is so that it can check and provide a warning if the destination's (i.e., SQL Server) metadata matches or not.

    This was an unexpected result - I expected it to fail as you did. Intuitively, the fact that it did not fail still makes sense. Since we are writing to a CSV file, then there is no way to control what the required metadata is. But, if we hook this to a SQL Server destination and the metadata doesn't match, then SQL Server will frown upon the out of bounds data (see my other answer).

    Now, I would still set the metadata of the output to match what it is in the pipeline as this has important considerations with distinguishing string versus numeric data types. So, if you try to set a datetime as integer then there will be no text qualifier, which may cause an error on the next input process. Conversely, you could have the same problem of setting an integer to a varchar and having, which means it would get a text qualifier.

    I think the fact that destination metadata is not enforced is a bit of a weak link in SSIS. But, it can be negated by just setting it to match the pipeline buffer, which is done automatically assuming it is the last task that is dropped to the design. With that being said, if you update the metadata on the pipeline after development is complete then you are in for a real treat with getting the metadata updated throughout the entire pipeline because some tasks have to be opened and closed while others have to be deleted and re-created in order to update the metadata.

    Additional Information

    TL DR: TinyInt is stored as an unsigned data type in SQL Server, which means it supports values between 0 and 255. So a value greater than 127 is acceptable - up to 255. Anything over will result in an error.

    The byte size indicates the maximum number of possible combinations where the signed/unsigned indicates whether or not the range is split between positive and negative values.

    • 1 byte = TinyInt in SQL Server
    • 1 byte is 8 bits = 256 combinations
    • Signed Range: -128 to 127
    • Unsigned Range: 0 to 255

    It is important to note that SQL Server does not support signing the data types directly. What I mean here is that there is no way to set the integer data types (i.e., TinyInt, Int, and BigInt) as signed or unsigned.

    • TinyInt it is unsigned
    • Int and BigInt are signed

    See reference below: Max Size of SQL Server Auto-Identity Field

    If we attempt to set a TinyInt to any value that is outside of the Unsigned Range (e.g., -1 or 256), then we get the following error message:

    This is why you were able to set a value greater than 127.

    Int Error Message:

    BigInt Error Message:

    With respect to Identity columns, if we declare an Identity column as Int (i.e., 32 bit ~= 4.3 billion combinations) and set the seed to 0 with an increment of 1, then SQL Server will only go to 2,147,483,647 rows before it stops, which is the maximum signed value. But, we are short by half the range. If we set the seed to -2,147,483,648 (don't forget to include 0 in the range) then SQL Server will increment through the full range of combinations before stopping.

    References:

    SSIS Data Types and Limitations

    Max Size of SQL Server Auto-Identity Field

    0 讨论(0)
  • 2021-01-14 18:38

    Data types validation

    I think this issue is related to the connection manager that is used, since the data type validation (outside the pipeline) is not done by Integration services, it is done by the service provider:

    • OLEDB for Excel and Access
    • SQL Database Engine for SQL Server
    • ...

    When it comes to flat file connection manager, it doesn't guarantee any data types consistency since all values are stored as text. As example try adding a flat file connection manager and select a text file that contains names, try changing the columns data types to Date and go to the Columns preview tab, it will show all columns without any issue. It only take care of the Row Delimiter, column delimiter , text qualifier and common properties used to read from a flat file. (similar to TextFieldParser class in VB.NET)

    The only case that data types may cause an exception is when you are using a Flat file source because the Flat file source will create an External columns with defined metadata in the Flat file connection manager and link them to the original columns (you can see that when you open the Advanced editor of the Flat file source) when SSIS try reading from flat file source the External columns will throw the exception.

    Binary output

    You should convert the column into binary within the package and map it to the destination column. As example you can use a script component to do that:

    public override void myInput_ProcessInputRow(myInputBuffer Row)
      {
    
      Row.ByteValues=System.Text.Encoding.UTF8.GetBytes (Row.name);
    
      }
    

    I haven't try if this will work with a Derived column or Data conversion transformation.

    References

    • Converting Input to (DT_BYTES,20)
    • DT Bytes in SSIS
    0 讨论(0)
提交回复
热议问题