问题
I am fetching data using tSoap component in which i am getting result in XML format as comma separated values. In which columns are separated by comma and rows are separated by '\n'.
After that i am using tExtractXMLField component for extracting data from the response.
But in data i have '\n' within the strings which is treating it as a new row. I tried using tReplace component to remove \n within the quotes using regex but data is too large, result causing StackOverflowError.
Also I tried using tNomalize component to separate the rows using CSV option, but the problem still persist.
Can you please help me on this. Thanks in advance.
Response which i am getting from the soap request is:
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header/>
<env:Body>
<ns2:getReportResultCsvResponse xmlns:ns2="http://service.admin.ws.five9.com/">
<return>TIMESTAMP,CALL ID,NOTES
"Mon, 17 Apr 2017 10:05:38",4223519,
"Mon, 17 Apr 2017 10:05:40",4223520,
"Mon, 17 Apr 2017 10:05:41",4223521,"Alexandria..
Monday -- 55 partial
Bal -- 224 May 1
Visa"
"Mon, 17 Apr 2017 10:05:42",4223522,
"Mon, 17 Apr 2017 10:05:43",4223523,
"Mon, 17 Apr 2017 10:11:04",4223524,
"Mon, 17 Apr 2017 10:05:43",4223524,
"Mon, 17 Apr 2017 10:05:45",4223525,</return>
</ns2:getReportResultCsvResponse>
</env:Body>
</env:Envelope>
Here as we can see "notes" column having data which have '\n' in it in between the quotes, and it is causing issue for extracting data. Can you please tell me how can i resolve this issue.
回答1:
In fact your file is a CSV file embedded into a XML file.
Because "notes" field is enclosed between ", a solution is to transform the file to pure CSV then, thanks to the appropriate "CSV option", the problem of "\n" disappears automagically.
Here is what the job looks like:
tFileInputFullRow read the input file as it come in a single field nammed "line" by default. Just set Header to 4 and Footer to 3 to ignore most of the XML part (supposing the file structure is always the same).
Pass the result to tMap just to remove the remaining XML "return" tag not removed by the previous step (because not on a separate line).
Here is the tMap with the replaceAll used to remove this tag:
After the tMap, pass the flow to a pure CSV file using tFileOutputDelimited. Let all options with the propsed default value.
Now, start a 2nd subjob with tFileInputDelimited to read the CSV file. Define the schema with the 3 columns "Timestamp", "CallId" and "Notes". Set the field separator to "," and the magic, click on "CSV options", nothing else.
To display only the record with "\n" in "notes" field, I set the Header to 3 and the Limit 1 (the reason why there is just 1 row after the tFileInputDelimited).
Here is the result:
As you can see, the field "notes" is dispatched on 4 lines as expected because of the "\n" characters.
Regards,
TRF
来源:https://stackoverflow.com/questions/43583532/talend-newline-character-in-middle-of-csv-column