SSIS Derived Column - Parse Text between break returns

后端未结

关注

 1  693

I have a text field from a SQL Server Source. It is a phone number field that typically has this format:

Home: 555-555-1212
Work: 555-555-1212
Cell: 555-555-1212


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2021-01-23 23:42
              
            
            
                                                                       
I look at your data and I see

Home:|555-555-1212|Work:|555-555-1212|Cell:|555-555-1212|Emergency:|555-555-1212

I'm using the pipe character, |, as a placeholder for where I would segment that string, which is basically wherever you have whitespace (space, tab, newline, etc).

There are two approaches to this. I'll start with the easy one.

Script Component

String.Split is your friend here. Look at what it did with that source data

I added a new Script Component, acting as a Transformation and created 4 output columns, all string of length 12 codepage 1252: Home, Work, Cell, and Emergency. I populate them like so

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    string[] split = Row.PhoneData.Split();

    Row.Home = split[1];
    Row.Work = split[4];
    Row.Cell = split[7];
    Row.Emergency = split[10];
}


Derived Column

I'm not going to build out a full blown implementation of this. The above is much to simple but I run into situations where ETL devs say they aren't allowed to use Script tasks/components and that's usually because people reached for them first instead of last.

The approach here is to have lots of Derived Columns Components on your Data Flow. It won't hurt performance and in fact can make it easier. It definitely will make your debugging easier as you'll have lots of it to do.

DER Find Colons

This would add 4 columns into the dataflow - HomeColonPosition, WorkColonPosition etc. You've already started down this path but just build it out into the actual data flow as you'll need to reference these positions and again, it's easier to fix the calculation that populates a column versus a calculation that's wrong and used everywhere. You're likely to find that 4 derived columns are useful here as you'd want to use the previous colon's position as the starting point for the third argument to FINDSTRING

Thus, instead of Work being

FINDSTRING(PhoneData, ":", FINDSTRING(PhoneData, ":" 1) + 1)


it would just be

FINDSTRING(PhoneData, ":", HomeColonPosition + 1)


Just knowing the position of the 4 colons in that string, I can figure out where the phone numbers are (maybe). The position of the colon + 2 (colon and the space) is the starting point and then go out 12 characters. 

Where this approach gets ugly, much as it did with the script approach is when that data isn't consistent. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复