I've got a file from a vendor that has 115 fixed-width fields per line. What's the best way of parsing that file into the 115 fields so I can use them in my code?
My first thought is just to make constants for each field like NAME_START_POSITION
and NAME_LENGTH
and using substring
. That just seems ugly so I'm curious if there's any other recommended ways of doing this. None of the couple of libraries a Google search turned up seemed any better either. Thanks
I've played arround with fixedformat4j and it is quite nice. Easy to configure converters and the like.
uniVocity-parsers comes with a FixedWidthParser
and FixedWidthWriter
the can support tricky fixed-width formats, including lines with different fields, paddings, etc.
// creates the sequence of field lengths in the file to be parsed
FixedWidthFields fields = new FixedWidthFields(4, 5, 40, 40, 8);
// creates the default settings for a fixed width parser
FixedWidthParserSettings settings = new FixedWidthParserSettings(fields); // many settings here, check the tutorial.
//sets the character used for padding unwritten spaces in the file
settings.getFormat().setPadding('_');
// creates a fixed-width parser with the given settings
FixedWidthParser parser = new FixedWidthParser(settings);
// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new File("path/to/fixed.txt")));
Here are a few examples for parsing all sorts of fixed-width inputs.
And here are some other examples for writing in general and other fixed-width examples specific to the fixed-width format.
Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)
Here is a basic implementation I use:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;
public class FlatFileParser {
public static void main(String[] args) {
File inputFile = new File("data.in");
File outputFile = new File("data.out");
int columnLengths[] = {7, 4, 10, 1};
String charset = "ISO-8859-1";
String delimiter = "~";
System.out.println(
convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
+ " lines written to " + outputFile.getAbsolutePath());
}
/**
* Converts a fixed width file to a delimited file.
* <p>
* This method ignores (consumes) newline and carriage return
* characters. Lines returned is based strictly on the aggregated
* lengths of the columns.
*
* A RuntimeException is thrown if run-off characters are detected
* at eof.
*
* @param inputFile the fixed width file
* @param outputFile the generated delimited file
* @param columnLengths the array of column lengths
* @param delimiter the delimiter used to split the columns
* @param charsetName the charset name of the supplied files
* @return the number of completed lines
*/
public static final long convertFixedWidthFile(
File inputFile,
File outputFile,
int columnLengths[],
String delimiter,
String charsetName) {
InputStream inputStream = null;
Reader inputStreamReader = null;
OutputStream outputStream = null;
Writer outputStreamWriter = null;
String newline = System.getProperty("line.separator");
String separator;
int data;
int currentIndex = 0;
int currentLength = columnLengths[currentIndex];
int currentPosition = 0;
long lines = 0;
try {
inputStream = new FileInputStream(inputFile);
inputStreamReader = new InputStreamReader(inputStream, charsetName);
outputStream = new FileOutputStream(outputFile);
outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);
while((data = inputStreamReader.read()) != -1) {
if(data != 13 && data != 10) {
outputStreamWriter.write(data);
if(++currentPosition > (currentLength - 1)) {
currentIndex++;
separator = delimiter;
if(currentIndex > columnLengths.length - 1) {
currentIndex = 0;
separator = newline;
lines++;
}
outputStreamWriter.write(separator);
currentLength = columnLengths[currentIndex];
currentPosition = 0;
}
}
}
if(currentIndex > 0 || currentPosition > 0) {
String line = "Line " + ((int)lines + 1);
String column = ", Column " + ((int)currentIndex + 1);
String position = ", Position " + ((int)currentPosition);
throw new RuntimeException("Incomplete record detected. " + line + column + position);
}
return lines;
}
catch (Throwable e) {
throw new RuntimeException(e);
}
finally {
try {
inputStreamReader.close();
outputStreamWriter.close();
}
catch (Throwable e) {
throw new RuntimeException(e);
}
}
}
}
Most suitable for Scala, but probably you could use it in Java
I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length
A basic usage is that you create a case class and it's described as an HList
(Shapeless):
case class Employee(name: String, number: Option[Int], manager: Boolean)
object Employee {
import com.github.atais.util.Read._
import cats.implicits._
import com.github.atais.util.Write._
import Codec._
implicit val employeeCodec: Codec[Employee] = {
fixed[String](0, 10) <<:
fixed[Option[Int]](10, 13, Alignment.Right) <<:
fixed[Boolean](13, 18)
}.as[Employee]
}
And you can easily decode your lines now or encode your object:
import Employee._
Parser.decode[Employee](exampleString)
Parser.encode(exampleObject)
The Apache Commons CSV project can handle fixed with files.
Looks like the fixed width functionality didn't survive promotion from the sandbox.
Here is the plain java code to read fixedwidth file:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
public class FixedWidth {
public static void main(String[] args) throws FileNotFoundException, IOException {
// String S1="NHJAMES TURNER M123-45-67890004224345";
String FixedLengths = "2,15,15,1,11,10";
List<String> items = Arrays.asList(FixedLengths.split("\\s*,\\s*"));
File file = new File("src/sample.txt");
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line1;
while ((line1 = br.readLine()) != null) {
// process the line.
int n = 0;
String line = "";
for (String i : items) {
// System.out.println("Before"+n);
if (i == items.get(items.size() - 1)) {
line = line + line1.substring(n, n + Integer.parseInt(i)).trim();
} else {
line = line + line1.substring(n, n + Integer.parseInt(i)).trim() + ",";
}
// System.out.println(
// S1.substring(n,n+Integer.parseInt(i)));
n = n + Integer.parseInt(i);
// System.out.println("After"+n);
}
System.out.println(line);
}
}
}
}
/*The method takes three parameters, fixed length record , length of record which will come from schema , say 10 columns and third parameter is delimiter*/
public class Testing {
public static void main(String as[]) throws InterruptedException {
fixedLengthRecordProcessor("1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10", 10, ",");
}
public static void fixedLengthRecordProcessor(String input, int reclength, String dilimiter) {
String[] values = input.split(dilimiter);
String record = "";
int recCounter = 0;
for (Object O : values) {
if (recCounter == reclength) {
System.out.println(record.substring(0, record.length() - 1));// process
// your
// record
record = "";
record = record + O.toString() + ",";
recCounter = 1;
} else {
record = record + O.toString() + ",";
recCounter++;
}
}
System.out.println(record.substring(0, record.length() - 1)); // process
// your
// record
}
}
If your string is called inStr
, convert it to a char array and use the
String(char[], start, length)
constructor
char[] intStrChar = inStr.toCharArray();
String charfirst10 = new String(intStrChar,0,9);
String char10to20 = new String(intStrChar,10,19);
Another library that can be used to parse a fixed width text source: https://github.com/org-tigris-jsapar/jsapar
Allows you to define a schema in xml or in code and parse fixed width text into java beans or fetch values from an internal format.
You can use \t+
as your delimiter.
Try Something like
String fields[] = line.split("\t+");
来源:https://stackoverflow.com/questions/1609807/whats-the-best-way-of-parsing-a-fixed-width-formatted-file-in-java