reading and sorting a variable length CSV file

五迷三道 提交于 2019-12-12 04:14:46

问题


We am using OpenVMS system and I believe it is using the Cobol from HP.

With a data file of a lot of records ( 500mb or more ) which variable length. The records are comma delimited. I would like to parse each records and extract corresponding fields for processing. After that, I might want to sort it by some particular fields. Is it possible with cobol?

I've seen sorting with fixed-length records only.


回答1:


Variable length is no problem, not sure exactly how this is done in VMS cobol but the IBMese for this is:-

FILE SECTION.
FD THE-FILE RECORD IS VARYING DEPENDING ON REC-LENGTH.
01 THE-RECORD PICTURE X(5000) .
WORKING-STORAGE SECTION.
01 REC-LENGTH PICTURE 9(5) COMPUTATIONAL.

When you read the file "REC-LENGTH" will contain the record length, when write a record it will write a record of length REC-LENGTH.

To handle the delimited record files you will probably need to use the "UNSTRING" verb to convert into a fixed format. This is pretty verbose (but then this is COBOL).

    UNSTRING record DELIMITED BY ","
INTO field1, field2, field3, field4, field5 etc....
END-UNSTRING

Once the record is in fixed format you can use the SORT as normal.




回答2:


The Cobol SORT verb will do what you need.

If the SD file contains variable-length records, all of the KEY data-items must be contained within the first n character positions of the record, where n equals the minimum records size specified for the file. In other words, they have to be in the fixed part.

However, you can get around this easily by using an input procedure. This will let you create a virtual file that has its keys in the right place. In your input procedure, you will reformat your variable, comma delimited, record, into one that has its keys at the front, then "Release" it to the sort.




回答3:


If my memory is correct, VMS has a SORT/MERGE utility that you could use after you have processed the file into a fixed file format (variable may also be possible). Typically a standalone SORT utility performs better than in-line COLBOL SORT and can be better design if the sort criteria changes in the future.




回答4:


No need to write a solution in COBOL, at least not to sort the file. The UNIX sort utility should do it just fine, just call sort -t ',' -n with maybe a couple of other options.



来源:https://stackoverflow.com/questions/6488443/reading-and-sorting-a-variable-length-csv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!