Changing a pipe delimited file to comma delimited in VB.net

[亡魂溺海] 提交于 2019-12-24 01:18:47

问题


So I have a set of pipe delimited inputs which are something like this:

"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | 82524 | 6846419 | 68247

and I am converting them to comma delimited using the code given below:

 Dim line As String
    Dim fields As String()
    Using sw As New StreamWriter("c:\test\output.txt")
        Using tfp As New FileIO.TextFieldParser("c:\test\test.txt")
            tfp.TextFieldType = FileIO.FieldType.Delimited
            tfp.Delimiters = New String() {"|"}
            tfp.HasFieldsEnclosedInQuotes = True
            While Not tfp.EndOfData
                fields = tfp.ReadFields
                line = String.Join(",", fields)
                sw.WriteLine(line)
            End While
        End Using
    End Using

So far so good. It only considers the delimiters that are present outside the quotes and changes them to the comma delimiter. But trouble starts when I have input with a stray quotation like below:

"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | "82524 | 6846419 | 68247

Here the code gives

MalformeLineExcpetion

Which I realize is due to the stray quotation in my input and since i am like a total noob in RegEx so i am not able to use it here(or I am incapable of). If anyone has any idea, it would be much appreciated.


回答1:


Here is the coded procedure described in the comments:

  • Read all the lines of the original input file,
  • fix the faulty lines (with Regex or anything else that fits),
  • use TextFieldParser to perform the parsing of the correct input
  • Join() the input parts created by TextFieldParser using , as separator
  • save the fixed, reconstructed input lines to the final output file

I'm using Wiktor Stribiżew Regex pattern: it looks like it should work given the description of the problem.

Note:
Of course I don't know whether a specific Encoding should be used.
Here, the Encoding is the default UTF-8 no-BOM, in and out.

"FaultyInput.txt" is the corrupted source file.
"FixedInput.txt" is the file containing the input lines fixed (hopefully) by the Regex. You could also use a MemoryStream.
"FixedOutput.txt" is the final CSV file, containing comma separated fields and the correct values.

These files are all read/written in the executable startup path.

Dim input As List(Of String) = File.ReadAllLines("FaultyInput.txt").ToList()
For line As Integer = 0 To input.Count - 1
    input(line) = Regex.Replace(input(line), "(""\b.*?\b"")|""", "$1")
Next

File.WriteAllLines("FixedInput.txt", input)

Dim output As List(Of String) = New List(Of String)
Using tfp As New FileIO.TextFieldParser("FixedInput.txt")
    tfp.TextFieldType = FileIO.FieldType.Delimited
    tfp.Delimiters = New String() {"|"}
    tfp.HasFieldsEnclosedInQuotes = True
    While Not tfp.EndOfData
        Dim fields As String() = tfp.ReadFields
        output.Add(String.Join(",", fields))
    End While
End Using

File.WriteAllLines("FixedOutput.txt", output)
'Eventually...
'File.Delete("FixedInput.txt")



回答2:


Sub ReadMalformedCSV()
    Dim s$
    Dim pattern$ = "(?x)" + vbCrLf +
                    "\b            #word boundary" + vbCrLf +
                    "(?'num'\d+)   #any number of digits" + vbCrLf +
                    "\b            #word boundary"
    '// Use "ReadLines" as it will lazily read one line at time
    For Each line In File.ReadLines("c:\test\output.txt")
        s = String.Join(",", Regex.Matches(line, pattern).
                                   Select(Function(e) e.Groups("num").Value))
        WriteLine(s)
    Next
End Sub


来源:https://stackoverflow.com/questions/53883750/changing-a-pipe-delimited-file-to-comma-delimited-in-vb-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!