How can I set an expression to the FileSpec property on Foreach File enumerator?

前端 未结 3 2013
傲寒
傲寒 2021-02-09 00:40

I\'m trying to create an SSIS package to process files from a directory that contains many years worth of files. The files are all named numerically, so to save processing ever

3条回答
  •  孤独总比滥情好
    2021-02-09 01:17

    From investigating how the ForEach loop works in SSIS (with a view to creating my own to solve the issue) it seems that the way it works (as far as I could see anyway) is to enumerate the file collection first, before any mask is specified. It's hard to tell exactly what's going on without seeing the underlying code for the ForEach loop but it seems to be doing it this way, resulting in slow performance when dealing with over 100k files.

    While @Siva's solution is fantastically detailed and definitely an improvement over my initial approach, it is essentially just the same process, except using an Expression Task to test the filename, rather than a Script Task (this does seem to offer some improvement).

    So, I decided to take a totally different approach and rather than use a file-based ForEach loop, enumerate the collection myself in a Script Task, apply my filtering logic, and then iterate over the remaining results. This is what I did:

    Sample Control Flow showing a Script Task to enumerate the files feeding into a ForEach Variable Enumerator

    In my Script Task, I use the asynchronous DirectoryInfo.EnumerateFiles method, which is the recommended approach for large file collections, as it allows streaming, rather than having to wait for the entire collection to be created before applying any logic.

    Here's the code:

    public void Main()
    {
        string sourceDir = Dts.Variables["SourceDirectory"].Value.ToString();
        int minJobId = (int)Dts.Variables["MinIndexId"].Value;
    
        //Enumerate file collection (using Enumerate Files to allow us to start processing immediately
        List activeFiles = new List();
    
        System.Threading.Tasks.Task listTask = System.Threading.Tasks.Task.Factory.StartNew(() =>
        {
             DirectoryInfo dir = new DirectoryInfo(sourceDir);
             foreach (FileInfo f in dir.EnumerateFiles("*.txt"))
             {
                  FileInfo file = f;
                  string filePath = file.FullName;
                  string fileName = filePath.Substring(filePath.LastIndexOf("\\") + 1);
                  int jobId = Convert.ToInt32(fileName.Substring(0, fileName.IndexOf(".txt")));
    
                  if (jobId > minJobId)
                       activeFiles.Add(filePath);
             }
        });
    
        //Wait here for completion
        System.Threading.Tasks.Task.WaitAll(new System.Threading.Tasks.Task[] { listTask });
        Dts.Variables["ActiveFilenames"].Value = activeFiles;
        Dts.TaskResult = (int)ScriptResults.Success;
    }
    

    So, I enumerate the collection, applying my logic as files are discovered and immediately adding the file path to my list for output. Once complete, I then assign this to an SSIS Object variable named ActiveFilenames which I'll use as the collection for my ForEach loop.

    I configured the ForEach loop as a ForEach From Variable Enumerator, which now iterates over a much smaller collection (Post-filtered List compared to what I can only assume was an unfiltered List or something similar in SSIS' built-in ForEach File Enumerator.

    So the tasks inside my loop can just be dedicated to processing the data, since it has already been filtered before hitting the loop. Although it doesn't seem to be doing much different to either my initial package or Siva's example, in production (for this particular case, anyway) it seems like filtering the collection and enumerating asynchronously provides a massive boost over using the built in ForEach File Enumerator.

    I'm going to continue investigating the ForEach loop container and see if I can replicate this logic in a custom component. If I get this working I'll post a link in the comments.

提交回复
热议问题