问题
I am using JTidy to convert from HTML to XHTML but I found in my XHTML file this tag
.
Can i prevent it ?
this is my code
//from html to xhtml
try
{
fis = new FileInputStream(htmlFileName);
}
catch (java.io.FileNotFoundException e)
{
System.out.println("File not found: " + htmlFileName);
}
Tidy tidy = new Tidy();
tidy.setShowWarnings(false);
tidy.setXmlTags(false);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);//
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(fis, null);
try
{
tidy.pprint(xmlDoc,new FileOutputStream("c.xhtml"));
}
catch(Exception e)
{
}
回答1:
I had only success, when the input is treated as XML as well. So either set xmltags to true
tidy.setXmlTags(true);
and live with the errors and warnings or do the conversion twice. First conversion to sanitize the html (html to xhtml) and a second conversion from xhtml to xhtml with set xmltags, thus no errors and warnings occur.
String htmlFileName = "test.html";
try( InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(htmlFileName);
FileOutputStream fos = new FileOutputStream("tmp.xhtml");) {
Tidy tidy = new Tidy();
tidy.setShowWarnings(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(in, fos);
} catch (Exception e) {
e.printStackTrace();
}
try( InputStream in = new FileInputStream("tmp.xhtml");
FileOutputStream fos = new FileOutputStream("c.xhtml");) {
Tidy tidy = new Tidy();
tidy.setShowWarnings(true);
tidy.setXmlTags(true);
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
tidy.setMakeClean(true);
Document xmlDoc = tidy.parseDOM(in, null);
tidy.pprint(xmlDoc, fos);
} catch (Exception e) {
e.printStackTrace();
}
I used the latest jtidy version 938.
回答2:
i created a function that parse the the xhtml code and remove the unwelcome tags and to add a link to the css File "tableStyle.css"
public static String xhtmlparser(){
String Cleanline="";
try {
// the file url
FileInputStream fstream = new FileInputStream("c.xhtml");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine = null;
int linescounter=0;
while ((strLine = br.readLine()) != null) {// read every line in the file
String m=strLine.replaceAll(" ", "");
linescounter++;
if(linescounter==5)
m=m+"\n"+ "<link rel="+ "\"stylesheet\" "+"type="+ "\"text/css\" "+"href= " +"\"tableStyle.css\""+ "/>";
Cleanline+=m+"\n";
}
}
catch(IOException e){}
return Cleanline;
}
but as a performance issue is it good?
by the way it works will
回答3:
You can use the following method to get xhtml from html
public static String getXHTMLFromHTML(String inputFile,
String outputFile) throws Exception {
File file = new File(inputFile);
FileOutputStream fos = null;
InputStream is = null;
try {
fos = new FileOutputStream(outputFile);
is = new FileInputStream(file);
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parse(is, fos);
} catch (FileNotFoundException e) {
e.printStackTrace();
}finally{
if(fos != null){
try {
fos.close();
} catch (IOException e) {
fos = null;
}
fos = null;
}
if(is != null){
try {
is.close();
} catch (IOException e) {
is = null;
}
is = null;
}
}
return outputFile;
}
来源:https://stackoverflow.com/questions/15063870/jtidy-java-api-toconvert-html-to-xhtml