问题
This is a continuation of another question. I'm getting this error when I try to parse my xml file.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 68; columnNumber: 12; Content is not allowed in trailing section.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at convert.ExcelXmlReader.getAndParseFile(ExcelXmlReader.java:55)
at convert.ExcelXmlReader.main(ExcelXmlReader.java:24)
The "lineNumber: 68; columnNumber: 12;" part matches up with the very last '>' in my xml file. When I try to delete the empty space after it, it still gives me the error. I tried to throw it into a xml validator, but it didn't come up with anything. I'm just really not sure about what I'm doing. I tried some other solutions from other stack overflow questions (looking through my file to find any weird characters after the xml file, making sure all the tags are closed) but none of them worked for me.
Does anybody have any hints where I should go now? Which would be the best direction to head?
<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
Current Code for the parsing. Most of the other code is in the previous question linked above.
private static void getAndParseFile() throws Exception {
System.out.println("getAndParseFile");
String fileName="C:\\Users\\windowsUserName\\Downloads\\F7BAH1P_List.xml";
File file = new File(fileName);
removeLineFromFile(file.getAbsolutePath());
System.out.println("Finished Removing Lines");
String fileContent = IOUtils.toString(new FileInputStream(file));
fileContent = fileContent.substring(0, fileContent.lastIndexOf('>')+1);
fileContent = fileContent.replaceAll("&#","");
PrintWriter pw = null;
pw = new PrintWriter(new FileWriter("C:\\Users\\windowsUserName\\Downloads\\tempfile.txt"));
pw.println(fileContent);
pw.flush();
ByteArrayInputStream bis = new ByteArrayInputStream(Charset.forName("UTF-16").encode(fileContent).array());
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandler handler = new SAXHandler();
parser.parse(bis, handler);
}
The RemoveLineFromFile removes 2 <row></row>
from the beginning and from the end of the xml file that are blank or contain some counter/title data.
private static void removeLineFromFile(String file) {
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return;
}
br = new BufferedReader(new FileReader(file));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new FileReader(file));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
//Delete the original file
if (!inFile.delete()) {
System.out.println("Could not delete original file");
return;
}
//Rename the new file to the filename the original file had.
if (!tempFile.renameTo(inFile))
System.out.println("Could not rename temp file");
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
Here is the xml file before going through "removelinefromfile"
<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">List for Work Request(F7BAH1P)</Data></Cell>
</Row>
<Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
<Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Count: 244</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
回答1:
You may get parse errors is your file encoding does not match the encoding in XML declaration:
<?xml version="1.0" encoding="utf-16"?>
FileWriter and FileReader assume that default character encoding is acceptable (UTF-8 on my system). You cannot rely on them to process UTF-16 encoded files in a portable manner. Here's their documentation:
Convenience class for writing character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream.
Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.
So you need to do what the documentation suggests - use the alternatives.
Here's some quick test code that demonstrates your issue with three different implementations of your removeLineFromFile
method:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Encoding {
private static File removeLineFromFile2(String file) {
File ret = null;
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return ret;
}
ret = inFile;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".2.tmp");
pw = new PrintWriter(new OutputStreamWriter(
new FileOutputStream(tempFile), "UTF-16"));
line = null;
br.close();
br = null;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
System.out.println("Temp file is: " + tempFile.getAbsolutePath());
ret = tempFile;
}
} catch (Exception ex) {
ex.printStackTrace();
}
return ret;
}
private static File removeLineFromFile1(String file) {
File ret = null;
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return ret;
}
ret = inFile;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".1.tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF-16"));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
System.out.println("Temp file is: " + tempFile.getAbsolutePath());
ret = tempFile;
}
} catch (Exception ex) {
ex.printStackTrace();
}
return ret;
}
private static File removeLineFromFile(String file) {
File ret = null;
BufferedReader br = null;
PrintWriter pw = null;
try {
File inFile = new File(file);
if (!inFile.isFile()) {
return ret;
}
ret = inFile;
br = new BufferedReader(new FileReader(file));
String line = null;
int totalRows=0;
boolean continueMethod = false;
//Count total number of rows in file
while ((line = br.readLine()) != null) {
//check if file is already formatted
if (line.contains("List for Work")){
continueMethod = true;
}
if (line.toLowerCase().contains("</row>")){
++totalRows;
}
}
if (continueMethod)
{
//Create a temporary file to hold the file with deleted lines.
File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
pw = new PrintWriter(new FileWriter(tempFile));
line = null;
br.close();
br = null;
br = new BufferedReader(new FileReader(file));
boolean ignoreMe = false;
int rowCounter = 0;
int rowCloser = 0;
//begin cycling through file and writing to new one.
while((line = br.readLine()) != null)
{
//if runs into a row, count it.
if (line.toLowerCase().contains("<row>")){
rowCounter++;
}
if (line.toLowerCase().contains("</row>")){
rowCloser++;
}
//Delete the first two, and last two lines
if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
{
ignoreMe = true;
//If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
if (rowCloser==totalRows)
rowCounter++;
}
else
{
ignoreMe = false;
}
//copy over other lines
if (!ignoreMe)
{
pw.println(line);
pw.flush();
}
}
br.close();
pw.close();
System.out.println("Temp file is: " + tempFile.getAbsolutePath());
ret = tempFile;
}
} catch (Exception ex) {
ex.printStackTrace();
}
return ret;
}
private static void parse(File file) {
try {
System.out.println("Parsing " + file.getAbsolutePath());
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
DefaultHandler handler = new DefaultHandler();
parser.parse(file, handler);
} catch (Exception ex) {
System.out.println("An exception occurred: " + ex.getMessage());
} finally {
System.out.println("Done with " + file.getAbsolutePath());
}
}
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
System.out.println("getAndParseFile");
String fileName=args[0];
File file = new File(fileName);
File f2 = removeLineFromFile2(file.getAbsolutePath());
File f1 = removeLineFromFile1(file.getAbsolutePath());
File f = removeLineFromFile(file.getAbsolutePath());
System.out.println("Finished Removing Lines");
parse(f2);
parse(f1);
parse(f);
}
}
removeLineFromFile2
represents what you need to be doing, removeLineFromFile1
represents what happens if you read stuff correctly, but write them in a wrong way (which I suspect is what is happening in your case) and removeLineFromFile
is your implementation, which does nothing on my system.
getAndParseFile
Temp file is: \path\to\sample-utf16.xml.2.tmp
Temp file is: \path\to\sample-utf16.xml.1.tmp
Finished Removing Lines
Parsing \path\to\sample-utf16.xml.2.tmp
Done with \path\to\sample-utf16.xml.2.tmp
Parsing \path\to\sample-utf16.xml.1.tmp
An exception occurred: Content is not allowed in prolog.
Done with \path\to\sample-utf16.xml.1.tmp
Parsing \path\to\sample-utf16.xml
Done with \path\to\sample-utf16.xml
All of the above assumes your input file is indeed in UTF-16 as specified in the XML file. I think this is not the case though. If you created the file yourself, you did it in a wrong way. Try opening it in Notepad++ (or a similar tool) and check the encoding through the Encoding menu (should say UCS-2 or UTF-16, not ANSI, UTF-8, etc.).
Your code should always explicitly specify encoding of files it expects.
来源:https://stackoverflow.com/questions/35951215/content-is-not-allowed-in-trailing-section-when-parsing-with-sax-java