问题
I am having power BI desktop report(pbix) internal file (DataMashup), which i am trying to decode. My Aim is to create Power-BI desktop report, Data Model using any programming language. I am using Java for initial.
files are encoded with some encoding technique.
I tried to get encoding of file and it is returning windows 1254. but decoding is not happening.
File f = new File("example.txt");
String[] charsetsToBeTested = {"UTF-8", "windows-1254", "ISO-8859-7"};
CharsetDetector cd = new CharsetDetector();
Charset charset = cd.detectCharset(f, charsetsToBeTested);
if (charset != null) {
try {
InputStreamReader reader = new InputStreamReader(new FileInputStream(f), charset);
int c = 0;
while ((c = reader.read()) != -1) {
System.out.print((char)c);
}
reader.close();
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}catch(IOException ioe){
ioe.printStackTrace();
}
}else{
System.out.println("Unrecognized charset.");
}
Unzipping of file is also not working
public void unZipIt(String zipFile, String outputFolder)
{
byte buffer[] = new byte[1024];
try
{
File folder = new File(outputFolder);
if(!folder.exists())
{
folder.mkdir();
}
ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFile));
System.out.println(zis);
System.out.println(zis.getNextEntry());
for(ZipEntry ze = zis.getNextEntry(); ze != null; ze = zis.getNextEntry())
{
String fileName = ze.getName();
System.out.println(ze);
File newFile = new File((new StringBuilder(String.valueOf(outputFolder))).append(File.separator).append(fileName).toString());
System.out.println((new StringBuilder("file unzip : ")).append(newFile.getAbsoluteFile()).toString());
(new File(newFile.getParent())).mkdirs();
FileOutputStream fos = new FileOutputStream(newFile);
int len;
while((len = zis.read(buffer)) > 0)
{
fos.write(buffer, 0, len);
}
fos.close();
}
zis.closeEntry();
zis.close();
System.out.println("Done");
}
catch(IOException ex)
{
ex.printStackTrace();
}
}
回答1:
The file contains a binary header and then XML with UTF-8 specified. The header data seems to hold the file name (Config/Package.xml), so assuming a zip format is understandable. With a zip format also there would be binary data at the end of file.
Maybe the file was downloaded using FTP, and a text conversion ("\n" to "\r\n") was done. Then the zip would be corrupted. Renaming the file to .zip might help testing the file with zip tools.
Try first the .tar format. This would be logical as the XML file is not compressed. Add .tar to the file ending.
Otherwise, if the content is always UTF-8 XML:
Path f = Paths.get("example.txt");
String start ="<?xml";
String end = ">";
byte[] bytes = Files.readAllBytes(f);
String s = new String(bytes, StandardCharsets.ISO_8859_1); // Single byte encoding.
int startI = s.indexOf(start);
int endI = s.lastIndexOf(end) + end.length();
//bytes = Arrays.copyOfRange(bytes, startI, endI);
String xml = new String(bytes, startI, endI - startI, StandardCharsets.UTF_8);
来源:https://stackoverflow.com/questions/49048268/how-to-decode-get-encoding-of-file-power-bi-desktop-file