Microsoft tech references for BULK INSERT may have some gaps … can't get them to work

后端 未结 1 1841
广开言路
广开言路 2021-01-29 14:51

Huge edit -- I removed the \';\' characters and replace them with \'GO\' and ... the secondary key and URL worked, except I got this: Cannot bulk load. The file \"06May2013_usr_

相关标签:
1条回答
  • 2021-01-29 15:06

    I feel obliged to post at some info even if it's not a full answer.

    I was getting this error:

    Msg 4860, Level 16, State 1, Line 58 Cannot bulk load. The file "container/folder/file.txt" does not exist or you don't have file access rights.

    I believe the problem might have been that I generated my SAS key from right now, but that is UTC time, meaning that here in Australia, the key only becomes valid in ten hours. So I generated a new key that started a month before and it worked.

    The SAS (Shared Access Signature) is a big string that is created as follows:

    1. In Azure portal, go to your storage account
    2. Press Shared Access Signature
    3. Fill in fields (make sure your start date is a few days prior, and you can leave Allowed IP addresses blank)
    4. Press Generate SAS
    5. Copy the string in the SAS Token field
    6. Remove the leading ? before pasting it into your SQL script

    Below is my full script with comments.

    -- Target staging table
    IF object_id('recycle.SampleFile') IS NULL
        CREATE TABLE recycle.SampleFile
        (
        Col1 VARCHAR(MAX)
        );
    
    
    -- more info here
    -- https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/
    
    -- You can use this to conditionally create the master key
    select * from sys.symmetric_keys where name like '%DatabaseMasterKey%'
    
    
    -- Run once to create a database master key
    -- Can't create credentials until a master key has been generated
    -- Here, zzz is a password that you make up and store for later use
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'zzz';
    
    
    
    
    -- Create a database credential object that can be reused for external access to Azure Blob
    CREATE DATABASE SCOPED CREDENTIAL BlobTestAccount 
    WITH 
    -- Must be SHARED ACCESS SIGNATURE to access blob storage
    IDENTITY= 'SHARED ACCESS SIGNATURE',
    -- Generated from Shared Access Signature area in Storage account
    -- Make sure the start date is at least a few days before
    -- otherwise UTC can mess you up because it might not be valid yet
    -- Don't include the ? or the endpoint. It starts with 'sv=', NOT '?' or 'https'
    SECRET = 'sv=2016-05-31&zzzzzzzzzzzz'
    
    
    -- Create the external data source
    -- Note location starts with https. I've seen examples without this but that doesn't work
    CREATE EXTERNAL DATA SOURCE BlobTest
    WITH ( 
        TYPE = BLOB_STORAGE, 
        LOCATION = 'https://yourstorageaccount.blob.core.windows.net',
        CREDENTIAL= BlobTestAccount);
    
    
    
    BULK INSERT recycle.SampleFile
    FROM 'container/folder/file'
    WITH ( DATA_SOURCE = 'BlobTest');
    
    
    
    
    -- If you're fancy you can use these to work out if your things exist first
    select * from sys.database_scoped_credentials
    select * from sys.external_data_sources
    
    DROP EXTERNAL DATA SOURCE BlobTest;
    DROP DATABASE SCOPED CREDENTIAL BlobTestAccount;
    

    One thing that this wont do that ADF does, is pick up a file based on wildcard.

    That is: If I have a file called ABC_20170501_003.TXT, I need to explicitly list that in the bulk insert load script, whereas in ADF I can just specify ABC_20170501 and it automatically wildcards the rest

    Unfortunately there is no (easy) way to enumerate files in blob storage from SQL Server. I eventually got around this by using Azure Automation to run a powershell script to enumerate the files and register them into a table that SQL Server could see. This seems complicated but actually Azure Automation is a very useful tool to learn and use, and it works very reliably

    More opinions on ADF:

    I couldn't find a way to pass the filename that I loaded (or other info) into the database.

    Do not use ADF if you need data to be loaded in the order it appears in the file (i.e. as captured by an identity field). ADF will try and do things in parallel. In fact, my ADF did insert things in order for about a week (i.e. as recorded by the identity) then one day it just started inserting stuff out of order.

    The timeslice concept is useful in limited circumstances (when you have cleanly delineated data in cleanly delineated files that you want to drop neatly into a table). In any other circumstances it is complicated, unwieldy and difficult to understand and use. In my experience real world data needs more complicated rules to work out and apply the correct merge keys.

    I don't know the cost difference between importing files via ADF and files via BULK INSERT, but ADF is slow. I don't have to patience to hack through Azure blades to find metrics right now but your talking 5 minutes in ADF vs 5 seconds in Bulk Insert

    UPDATE:

    Try Azure Data Factory V2. It is vastly improved, and you are no longer bound to timeslices.

    0 讨论(0)
提交回复
热议问题