How to redirect rejected rows to another file?

后端 未结 1 1059
醉酒成梦
醉酒成梦 2021-01-24 04:17

This is my Source csv file

col1,col2,col3,col4,col5,col6
1,A,AA,X,Y,H
2,B,,,CC,D, -- reject this row because (CC)it should be in col3
3,E,FF,Y,L
4,G         


        
1条回答
  •  情歌与酒
    2021-01-24 05:04

    If you are new to VBScript, you should start your coding with a plan (the main task, the subtasks, the ideas to solve each task) and a skeleton .vbs that makes it easy to experiment with the methods used to solve the (sub)tasks.

    In your case the main task is to "filter bad lines in a source file to a destination file". This task is solved if you can "read the lines of the source file", "recognize the bad ones", and "write them to the destination file".

    The default way to read a file's lines is:

      Dim tsIn : Set tsIn = goFS.OpenTextFile("..\data\21755767.csv")
      Do Until tsIn.AtEndOfStream
         Dim sLine : sLine = tsIn.ReadLine()
      Loop
      tsIn.Close
    

    "default" means: You must have very good/specific reasons not to choose this idiom (e.g.: using .ReadAll() on a short file for in-place-editing or debug-display) or to deviate from it (e.g.: you can't rely on .OpenTextFile's default arguments if your file is UTF-16 encoded). For some atrocities - e.g.

      Do While Not tsIn.AtEndOfStream = "False"
    

    there is no excuse at all.

    Writing (some) lines to another file should look like this:

      Dim tsOut : Set tsOut = goFS.CreateTextFile("..\data\21755767-bads.csv")
      Dim tsIn  : Set tsIn  = goFS.OpenTextFile("..\data\21755767.csv")
      Do Until tsIn.AtEndOfStream
         Dim sLine : sLine = tsIn.ReadLine()
         If True Then
            tsOut.WriteLine sLine
         End If
      Loop
      tsIn.Close
      tsOut.Close
    

    Using .CreateTextFile(JustTheFileSpec) instead of .OpenTextFile(lots, of, other, args) is the most simple/clear/error-save approach for the standard case: new (perhaps empty) destination file for each run of the script.

    As mentioned above, appending a

      WScript.Echo  goFS.OpenTextFile("..\data\21755767-bads.csv").ReadAll()
    

    for display is ok.

    The idea for the filter subtask is based on the observations:

    1. the header line contains the correct number of fields/commas
    2. the offending lines contain a bad number of commas

    Then it's easy to combine the results of the work above into:

      Dim tsOut   : Set tsOut = goFS.CreateTextFile("..\data\21755767-bads.csv")
      Dim tsIn    : Set tsIn  = goFS.OpenTextFile("..\data\21755767.csv")
      Dim sLine   : sLine     = tsIn.ReadLine()
      Dim nUBSeps : nUBSeps   = UBound(Split(sLine, ","))
      Do Until tsIn.AtEndOfStream
         sLine = tsIn.ReadLine()
         If nUBSeps <> UBound(Split(sLine, ",")) Then
            tsOut.WriteLine sLine
         End If
      Loop
      tsIn.Close
      tsOut.Close
    

    The full script:

    Option Explicit ' (1)
    
    Dim goFS : Set goFS = CreateObject("Scripting.FileSystemObject") ' (2)
    
    WScript.Quit demoReadFile() ' (3)
    WScript.Quit demoReadWriteFile()
    WScript.Quit demoFilterBads()
    
    Function demoReadFile() ' (4)
      demoReadFile = 0
      Dim tsIn : Set tsIn = goFS.OpenTextFile("..\data\21755767.csv")
      Do Until tsIn.AtEndOfStream
         Dim sLine : sLine = tsIn.ReadLine()
         WScript.Echo tsIn.Line - 1, sLine
      Loop
      tsIn.Close
    End Function
    
    Function demoReadWriteFile() ' (5)
      demoReadWriteFile = 0
      Dim tsOut : Set tsOut = goFS.CreateTextFile("..\data\21755767-bads.csv")
      Dim tsIn  : Set tsIn  = goFS.OpenTextFile("..\data\21755767.csv")
      Do Until tsIn.AtEndOfStream
         Dim sLine : sLine = tsIn.ReadLine()
         If True Then
            tsOut.WriteLine sLine
         End If
      Loop
      tsIn.Close
      tsOut.Close
      WScript.Echo  goFS.OpenTextFile("..\data\21755767-bads.csv").ReadAll()
    End Function
    
    Function demoFilterBads() ' (6)
      demoFilterBads = 0
      Dim tsOut   : Set tsOut = goFS.CreateTextFile("..\data\21755767-bads.csv")
      Dim tsIn    : Set tsIn  = goFS.OpenTextFile("..\data\21755767.csv")
      Dim sLine   : sLine     = tsIn.ReadLine()
      Dim nUBSeps : nUBSeps   = UBound(Split(sLine, ","))
      Do Until tsIn.AtEndOfStream
         sLine = tsIn.ReadLine()
         If nUBSeps <> UBound(Split(sLine, ",")) Then
            tsOut.WriteLine sLine
         End If
      Loop
      tsIn.Close
      tsOut.Close
      WScript.Echo  goFS.OpenTextFile("..\data\21755767-bads.csv").ReadAll()
    End Function
    

    sample output:

    demoReadFile()

    cscript 21755767.vbs
    1 col1,col2,col3,col4,col5
    2 1,A,AA,X,Y
    3 2,B,,,CC,D
    4 3,E,FF,Y,
    5 4,G,,,XX,P
    

    demoFilterBads()

    cscript 21755767.vbs
    2,B,,,CC,D
    4,G,,,XX,P
    

    Such a script could start from a skeleton/template like:

    Option Explicit ' (1)
    
    Dim goFS : Set goFS = CreateObject("Scripting.FileSystemObject") ' (2)
    
    WScript.Quit step00() ' (3)
    WScript.Quit step01()
    
    ...
    
    Function step00() ' (4)
      step00 = 0
      ...
    End Function
    
    1. All your scripts should start with "Option Explicit" to guard against mis-spelled variable names
    2. If you allow global variables at all, then goFS is a good candidate. If not, create just one FSO and pass it to the Subs/Functions/Methods that need it. Never create a new FSO each time you need its methods/properties.
    3. Use comments or reordering to call the function you currently work with
    4. 'sample' function; write a lot of them to check/elaborate your ideas

    Update wrt comment:

    Add an utility function:

    Function qq(s) : qq = """" & s & """" : End Function
    

    and an experiment/explore function:

    Function demoFilterSteps()
      demoFilterSteps = 0
      Dim sLine
      For Each sLine In Split("col1,col2,col3,col4,col5 1,A,AA,X,Y 2,B,,,CC,D")
          WScript.Echo 0, qq(sLine)
          Dim aParts  : aParts  = Split(sLine, ",")
          Dim nUBSeps : nUBSeps = UBound(aParts)
          WScript.Echo 1, nUBSeps, qq(Join(aParts, "-"))
    
          WScript.Echo
      Next
      nUBSeps = 4            ' correct
      sLine   = "2,B,,,CC,D" ' bad
      Dim sExpr : sExpr = "nUBSeps <> UBound(Split(sLine, "",""))"
      WScript.Echo 2, nUBSeps, qq(sLine), sExpr, CStr(Eval(sExpr))
    End Function
    

    output:

    cscript 21755767.vbs
    0 "col1,col2,col3,col4,col5"
    1 4 "col1-col2-col3-col4-col5"
    
    0 "1,A,AA,X,Y"
    1 4 "1-A-AA-X-Y"
    
    0 "2,B,,,CC,D"
    1 5 "2-B---CC-D"
    
    2 4 "2,B,,,CC,D" nUBSeps <> UBound(Split(sLine, ",")) True
    

    To see

    1. Splitting the header line results in a nUBSeps of 4 (4 separators between 5 fields)
    2. A good line results in a nUBSeps of 4 too - not a surprise
    3. A bad line gives a nUBSeps different (<>) of 4; 5 in this sample
    4. Assuming nUBSeps is 4 (correct), the expression nUBSeps <> UBound(Split(sLine, ",")) evaluates to True, when sLine holds a bad line - so that line should be written to the destination file

    0 讨论(0)
提交回复
热议问题