问题
I have a text file that needs to be converted into a csv file. My plan is to:
- parse the file line by line
- search and replace commas inside double quotes with a space
- then delete all double quotes
- append the line to a new csv file
Question: I need a function that will recognize the comma inside a double quote and replace it.
Here is a sample line:
"MRS Brown","4611 BEAUMONT ST"," ","WARRIOR RUN, PA"
回答1:
Your file seems to be already in a CSV complaint format. Any good CSV reader would be able to read it correctly.
If your problem is just reading the field values correctly, then you need to read it the correct way.
Here is one way to do it:
using Microsoft.VisualBasic.FileIO;
private void button1_Click(object sender, EventArgs e)
{
TextFieldParser tfp = new TextFieldParser("C:\\Temp\\Test.csv");
tfp.Delimiters = new string[] { "," };
tfp.HasFieldsEnclosedInQuotes = true;
while (!tfp.EndOfData)
{
string[] fields = tfp.ReadFields();
// do whatever you want to do with the fields now...
// e.g. remove the commas and double-quotes from the fields.
for (int i = 0; i < fields.Length;i++ )
{
fields[i] = fields[i].Replace(","," ").Replace("\"","");
}
// this is to show what we got as the output
textBox1.AppendText(String.Join("\t", fields) + "\n");
}
tfp.Close();
}
EDIT:
I just noticed that the question has been filed under C#, VB.NET-2010. Here is the VB.NET version, just in case you are coding in VB.
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim tfp As New FileIO.TextFieldParser("C:\Temp\Test.csv")
tfp.Delimiters = New String() {","}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
Dim fields() As String = tfp.ReadFields
'' do whatever you want to do with the fields now...
'' e.g. remove the commas and double-quotes from the fields.
For i As Integer = 0 To fields.Length - 1
fields(i) = fields(i).Replace(",", " ").Replace("""", "")
Next
'' this is to show what we got as the output
TextBox1.AppendText(Join(fields, vbTab) & vbCrLf)
End While
tfp.Close()
End Sub
回答2:
Here's a simple function that will remove commas embedded between two double quotes in a string. You can pass in a long string that has multiple occurrences of "abc,123",10/13/12,"some description"...etc. It will also remove the double quotes.
Private Function ParseCommasInQuotes(ByVal arg As String) As String
Dim foundEndQuote As Boolean = False
Dim foundStartQuote As Boolean = False
Dim output As New StringBuilder()
'44 = comma
'34 = double quote
For Each element As Char In arg
If foundEndQuote Then
foundStartQuote = False
foundEndQuote = False
End If
If element.Equals(Chr(34)) And (Not foundEndQuote) And foundStartQuote Then
foundEndQuote = True
Continue For
End If
If element.Equals(Chr(34)) And Not foundStartQuote Then
foundStartQuote = True
Continue For
End If
If (element.Equals(Chr(44)) And foundStartQuote) Then
'skip the comma...its between double quotes
Else
output.Append(element)
End If
Next
Return output.ToString()
End Function
回答3:
Thanks to Baz, The Glockster Answer in VB, I just converted it in C# and its works well. With this code you don't need any third party parser.
string line = reader.ReadLine();
line = ParseCommasInQuotes(line);
private string ParseCommasInQuotes(string arg)
{
bool foundEndQuote = false;
bool foundStartQuote = false;
StringBuilder output = new StringBuilder();
//44 = comma
//34 = double quote
foreach (char element in arg)
{
if (foundEndQuote)
{
foundStartQuote = false;
foundEndQuote = false;
}
if (element.Equals((Char)34) & (!foundEndQuote) & foundStartQuote)
{
foundEndQuote = true;
continue;
}
if (element.Equals((Char)34) & !foundStartQuote)
{
foundStartQuote = true;
continue;
}
if ((element.Equals((Char)44) & foundStartQuote))
{
//skip the comma...its between double quotes
}
else
{
output.Append(element);
}
}
return output.ToString();
}
回答4:
I did not understand your question before. Now I'm pretty sure I got it right:
TextFieldParser parser = new TextFieldParser(@"c:\file.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Do whatever you need
}
}
parser.Close();
回答5:
var result = Regex.Replace(input,
@"[^\""]([^\""])*[^\""]",
m => m.Value.Replace(",", " ") );
回答6:
It doesn't sound as though what you are describing will end up as a csv file but to answer your question I would do this.
First you need to get the text file into some usable code that can be looped through like this:
public static List<String> GetTextListFromDiskFile(String fileName)
{
List<String> list = new List<String>();
try
{
//load the file into the streamreader
System.IO.StreamReader sr = new System.IO.StreamReader(fileName);
//loop through each line of the file
while (sr.Peek() >= 0)
{
list.Add(sr.ReadLine());
}
sr.Close();
}
catch (Exception ex)
{
list.Add("Error: Could not read file from disk. Original error: " + ex.Message);
}
return list;
}
Then loop through the list and using a simple foreach loop and run the replace on the list like this:
foreach (String item in list)
{
String x = item.Replace("\",\"", "\" \"");
x = x.Replace("\"", "");
}
After you do this then you need to create you csv file line by line. I would use a StringBuilder again and then just do an sb.AppendLine(x) to create the String that will become the text file then write it out to the disk using something like this.
public static void SaveFileToDisk(String filePathName, String fileText)
{
using (StreamWriter outfile = new StreamWriter(filePathName))
{
outfile.Write(fileText);
}
}
回答7:
This worked for me. Hope it helps someone else.
Private Sub Command1_Click()
Open "c:\\dir\file.csv" For Input As #1
Open "c:\\dir\file2.csv" For Output As #2
Do Until EOF(1)
Line Input #1, test$
99
c = InStr(test$, """""")
If c > 0 Then
test$ = Left$(test$, c - 1) + Right$(test$, Len(test$) - (c + 1))
GoTo 99
End If
Print #2, test$
Loop
End Sub
回答8:
I would do it all before you start to process it line by line. Also, check out CsvHelper. It's fast and easy. Just take your results and throw them into a TextReader and then pass it to CvsReader.
Here is your comma (in double quotes) and then subsequent double quote stripper.
using (TextReader reader = File.OpenText(file))
{
// remove commas and double quotes inside file
var pattern = @"\""(.+?,.+)+\""";
var results = Regex.Replace(reader.ReadToEnd(), pattern, match => match.Value.Replace(",", " "));
results = results.Replace("\"", "");
}
来源:https://stackoverflow.com/questions/9889225/parse-text-file-and-remove-commas-inside-double-quotes