convert a fixed width file from text to csv

后端 未结 4 2007
梦谈多话
梦谈多话 2021-02-01 09:59

I have a large data file in text format and I want to convert it to csv by specifying each column length.

number of columns = 5

column length

[4          


        
相关标签:
4条回答
  • 2021-02-01 10:35

    GNU awk (gawk) supports this directly with FIELDWIDTHS, e.g.:

    gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile
    

    Output:

    aasd,fh,90135,1,2
    ajsh,dj, 2445,d,f
    
    0 讨论(0)
  • 2021-02-01 10:37

    I would use sed and catch the groups with the given length:

    $ sed -r 's/^(.{4})(.{2})(.{5})(.{1})(.{1})$/\1,\2,\3,\4,\5/' file
    aasd,fh,90135,1,2
    ajsh,dj, 2445,d,f
    
    0 讨论(0)
  • 2021-02-01 10:38

    Here's a solution that works with regular awk (does not require gawk).

    awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
    

    It uses awk's substr function to define each field's start position and length. OFS defines what the output field separator is (in this case, a comma).

    (Side note: This only works if the source data does not have any commas. If the data has commas, then you have to escape them to be proper CSV, which is beyond the scope of this question.)

    Demo:

    echo 'aasdfh9013512
    ajshdj 2445df' | 
    awk -v OFS=',' '{print substr($0,1,4), substr($0,5,2), substr($0,7,5), substr($0,12,1), substr($0,13,1)}'
    

    Output:

    aasd,fh,90135,1,2
    ajsh,dj, 2445,d,f
    
    0 讨论(0)
  • 2021-02-01 10:40

    If any one is still looking for a solution, I have developed a small script in python. its easy to use provided you have python 3.5

    https://github.com/just10minutes/FixedWidthToDelimited/blob/master/FixedWidthToDelimiter.py

      """
    This script will convert Fixed width File into Delimiter File, tried on Python 3.5 only
    Sample run: (Order of argument doesnt matter)
    python ConvertFixedToDelimiter.py -i SrcFile.txt -o TrgFile.txt -c Config.txt -d "|"
    Inputs are as follows
    1. Input FIle - Mandatory(Argument -i) - File which has fixed Width data in it
    2. Config File - Optional (Argument -c, if not provided will look for Config.txt file on same path, if not present script will not run)
        Should have format as
        FieldName,fieldLength
        eg:
        FirstName,10
        SecondName,8
        Address,30
        etc:
    3. Output File - Optional (Argument -o, if not provided will be used as InputFIleName plus Delimited.txt)
    4. Delimiter - Optional (Argument -d, if not provided default value is "|" (pipe))
    """
    from collections import OrderedDict
    import argparse
    from argparse import ArgumentParser
    import os.path
    import sys
    
    
    def slices(s, args):
        position = 0
        for length in args:
            length = int(length)
            yield s[position:position + length]
            position += length
    
    def extant_file(x):
        """
        'Type' for argparse - checks that file exists but does not open.
        """
        if not os.path.exists(x):
            # Argparse uses the ArgumentTypeError to give a rejection message like:
            # error: argument input: x does not exist
            raise argparse.ArgumentTypeError("{0} does not exist".format(x))
        return x
    
    
    
    
    
    parser = ArgumentParser(description="Please provide your Inputs as -i InputFile -o OutPutFile -c ConfigFile")
    parser.add_argument("-i", dest="InputFile", required=True,    help="Provide your Input file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE", type=extant_file)
    parser.add_argument("-o", dest="OutputFile", required=False,    help="Provide your Output file name here, if file is on different path than where this script resides then provide full path of the file", metavar="FILE")
    parser.add_argument("-c", dest="ConfigFile", required=False,   help="Provide your Config file name here,File should have value as fieldName,fieldLength. if file is on different path than where this script resides then provide full path of the file", metavar="FILE",type=extant_file)
    parser.add_argument("-d", dest="Delimiter", required=False,   help="Provide the delimiter string you want",metavar="STRING", default="|")
    
    args = parser.parse_args()
    
    #Input file madatory
    InputFile = args.InputFile
    #Delimiter by default "|"
    DELIMITER = args.Delimiter
    
    #Output file checks
    if args.OutputFile is None:
        OutputFile = str(InputFile) + "Delimited.txt"
        print ("Setting Ouput file as "+ OutputFile)
    else:
        OutputFile = args.OutputFile
    
    #Config file check
    if args.ConfigFile is None:
        if not os.path.exists("Config.txt"):
            print ("There is no Config File provided exiting the script")
            sys.exit()
        else:
            ConfigFile = "Config.txt"
            print ("Taking Config.txt file on this path as Default Config File")
    else:
        ConfigFile = args.ConfigFile
    
    fieldNames = []
    fieldLength = []
    myvars = OrderedDict()
    
    
    with open(ConfigFile) as myfile:
        for line in myfile:
            name, var = line.partition(",")[::2]
            myvars[name.strip()] = int(var)
    for key,value in myvars.items():
        fieldNames.append(key)
        fieldLength.append(value)
    
    with open(OutputFile, 'w') as f1:
        fieldNames = DELIMITER.join(map(str, fieldNames))
        f1.write(fieldNames + "\n")
        with open(InputFile, 'r') as f:
            for line in f:
                rec = (list(slices(line, fieldLength)))
                myLine = DELIMITER.join(map(str, rec))
                f1.write(myLine + "\n")
    
    0 讨论(0)
提交回复
热议问题