Parsing a fixed-width formatted file in Java

后端 未结 10 1969
遥遥无期
遥遥无期 2020-11-28 10:51

I\'ve got a file from a vendor that has 115 fixed-width fields per line. How can I parse that file into the 115 fields so I can use them in my code?

My first thought

相关标签:
10条回答
  • 2020-11-28 11:09

    The Apache Commons CSV project can handle fixed with files.

    Looks like the fixed width functionality didn't survive promotion from the sandbox.

    0 讨论(0)
  • 2020-11-28 11:10

    Most suitable for Scala, but probably you could use it in Java

    I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length

    A basic usage is that you create a case class and it's described as an HList (Shapeless):

    case class Employee(name: String, number: Option[Int], manager: Boolean)
    
    object Employee {
    
        import com.github.atais.util.Read._
        import cats.implicits._
        import com.github.atais.util.Write._
        import Codec._
    
        implicit val employeeCodec: Codec[Employee] = {
          fixed[String](0, 10) <<:
            fixed[Option[Int]](10, 13, Alignment.Right) <<:
            fixed[Boolean](13, 18)
        }.as[Employee]
    }
    

    And you can easily decode your lines now or encode your object:

    import Employee._
    Parser.decode[Employee](exampleString)
    Parser.encode(exampleObject)
    
    0 讨论(0)
  • 2020-11-28 11:11

    Another library that can be used to parse a fixed width text source: https://github.com/org-tigris-jsapar/jsapar

    Allows you to define a schema in xml or in code and parse fixed width text into java beans or fetch values from an internal format.

    Disclosure: I am the author of the jsapar library. If it does not fulfill your needs, on this page you can find a comprehensive list of other parsing libraries. Most of them are only for delimited files but some can parse fixed width as well.

    0 讨论(0)
  • 2020-11-28 11:12

    uniVocity-parsers comes with a FixedWidthParser and FixedWidthWriter the can support tricky fixed-width formats, including lines with different fields, paddings, etc.

    // creates the sequence of field lengths in the file to be parsed
    FixedWidthFields fields = new FixedWidthFields(4, 5, 40, 40, 8);
    
    // creates the default settings for a fixed width parser
    FixedWidthParserSettings settings = new FixedWidthParserSettings(fields); // many settings here, check the tutorial.
    
    //sets the character used for padding unwritten spaces in the file
    settings.getFormat().setPadding('_');
    
    // creates a fixed-width parser with the given settings
    FixedWidthParser parser = new FixedWidthParser(settings);
    
    // parses all rows in one go.
    List<String[]> allRows = parser.parseAll(new File("path/to/fixed.txt")));
    

    Here are a few examples for parsing all sorts of fixed-width inputs.

    And here are some other examples for writing in general and other fixed-width examples specific to the fixed-width format.

    Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)

    0 讨论(0)
  • 2020-11-28 11:17

    Here is a basic implementation I use:

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.io.OutputStreamWriter;
    import java.io.Reader;
    import java.io.Writer;
    
    public class FlatFileParser {
    
      public static void main(String[] args) {
        File inputFile = new File("data.in");
        File outputFile = new File("data.out");
        int columnLengths[] = {7, 4, 10, 1};
        String charset = "ISO-8859-1";
        String delimiter = "~";
    
        System.out.println(
            convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
            + " lines written to " + outputFile.getAbsolutePath());
      }
    
      /**
       * Converts a fixed width file to a delimited file.
       * <p>
       * This method ignores (consumes) newline and carriage return
       * characters. Lines returned is based strictly on the aggregated
       * lengths of the columns.
       *
       * A RuntimeException is thrown if run-off characters are detected
       * at eof.
       *
       * @param inputFile the fixed width file
       * @param outputFile the generated delimited file
       * @param columnLengths the array of column lengths
       * @param delimiter the delimiter used to split the columns
       * @param charsetName the charset name of the supplied files
       * @return the number of completed lines
       */
      public static final long convertFixedWidthFile(
          File inputFile,
          File outputFile,
          int columnLengths[],
          String delimiter,
          String charsetName) {
    
        InputStream inputStream = null;
        Reader inputStreamReader = null;
        OutputStream outputStream = null;
        Writer outputStreamWriter = null;
        String newline = System.getProperty("line.separator");
        String separator;
        int data;
        int currentIndex = 0;
        int currentLength = columnLengths[currentIndex];
        int currentPosition = 0;
        long lines = 0;
    
        try {
          inputStream = new FileInputStream(inputFile);
          inputStreamReader = new InputStreamReader(inputStream, charsetName);
          outputStream = new FileOutputStream(outputFile);
          outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);
    
          while((data = inputStreamReader.read()) != -1) {
            if(data != 13 && data != 10) {
              outputStreamWriter.write(data);
              if(++currentPosition > (currentLength - 1)) {
                currentIndex++;
                separator = delimiter;
                if(currentIndex > columnLengths.length - 1) {
                  currentIndex = 0;
                  separator = newline;
                  lines++;
                }
                outputStreamWriter.write(separator);
                currentLength = columnLengths[currentIndex];
                currentPosition = 0;
              }
            }
          }
          if(currentIndex > 0 || currentPosition > 0) {
            String line = "Line " + ((int)lines + 1);
            String column = ", Column " + ((int)currentIndex + 1);
            String position = ", Position " + ((int)currentPosition);
            throw new RuntimeException("Incomplete record detected. " + line + column + position);
          }
          return lines;
        }
        catch (Throwable e) {
          throw new RuntimeException(e);
        }
        finally {
          try {
            inputStreamReader.close();
            outputStreamWriter.close();
          }
          catch (Throwable e) {
            throw new RuntimeException(e);
          }
        }
      }
    }
    
    0 讨论(0)
  • 2020-11-28 11:21

    If your string is called inStr, convert it to a char array and use the String(char[], start, length) constructor

    char[] intStrChar = inStr.toCharArray();
    String charfirst10 = new String(intStrChar,0,9);
    String char10to20 = new String(intStrChar,10,19);
    
    0 讨论(0)
提交回复
热议问题