问题
So I have a set of pipe delimited inputs which are something like this:
"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | 82524 | 6846419 | 68247
and I am converting them to comma delimited using the code given below:
Dim line As String
Dim fields As String()
Using sw As New StreamWriter("c:\test\output.txt")
Using tfp As New FileIO.TextFieldParser("c:\test\test.txt")
tfp.TextFieldType = FileIO.FieldType.Delimited
tfp.Delimiters = New String() {"|"}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
fields = tfp.ReadFields
line = String.Join(",", fields)
sw.WriteLine(line)
End While
End Using
End Using
So far so good. It only considers the delimiters that are present outside the quotes and changes them to the comma delimiter. But trouble starts when I have input with a stray quotation like below:
"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | "82524 | 6846419 | 68247
Here the code gives
MalformeLineExcpetion
Which I realize is due to the stray quotation in my input and since i am like a total noob in RegEx so i am not able to use it here(or I am incapable of). If anyone has any idea, it would be much appreciated.
回答1:
Here is the coded procedure described in the comments:
- Read all the lines of the original input file,
- fix the faulty lines (with Regex or anything else that fits),
- use
TextFieldParser
to perform the parsing of the correct input Join()
the input parts created byTextFieldParser
using,
as separator- save the fixed, reconstructed input lines to the final output file
I'm using Wiktor Stribiżew Regex pattern: it looks like it should work given the description of the problem.
Note:
Of course I don't know whether a specific Encoding should be used.
Here, the Encoding is the default UTF-8 no-BOM
, in and out.
"FaultyInput.txt"
is the corrupted source file."FixedInput.txt"
is the file containing the input lines fixed (hopefully) by the Regex. You could also use a MemoryStream
."FixedOutput.txt"
is the final CSV
file, containing comma separated fields and the correct values.
These files are all read/written in the executable startup path.
Dim input As List(Of String) = File.ReadAllLines("FaultyInput.txt").ToList()
For line As Integer = 0 To input.Count - 1
input(line) = Regex.Replace(input(line), "(""\b.*?\b"")|""", "$1")
Next
File.WriteAllLines("FixedInput.txt", input)
Dim output As List(Of String) = New List(Of String)
Using tfp As New FileIO.TextFieldParser("FixedInput.txt")
tfp.TextFieldType = FileIO.FieldType.Delimited
tfp.Delimiters = New String() {"|"}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
Dim fields As String() = tfp.ReadFields
output.Add(String.Join(",", fields))
End While
End Using
File.WriteAllLines("FixedOutput.txt", output)
'Eventually...
'File.Delete("FixedInput.txt")
回答2:
Sub ReadMalformedCSV()
Dim s$
Dim pattern$ = "(?x)" + vbCrLf +
"\b #word boundary" + vbCrLf +
"(?'num'\d+) #any number of digits" + vbCrLf +
"\b #word boundary"
'// Use "ReadLines" as it will lazily read one line at time
For Each line In File.ReadLines("c:\test\output.txt")
s = String.Join(",", Regex.Matches(line, pattern).
Select(Function(e) e.Groups("num").Value))
WriteLine(s)
Next
End Sub
来源:https://stackoverflow.com/questions/53883750/changing-a-pipe-delimited-file-to-comma-delimited-in-vb-net