Generating Record Layouts for EBCDIC Data Files.

问题

We are attempting to write a tool in Perl which is expected to parse a fixed length EBCDIC data file and generate the record layout by looking at the hex value of each byte in the record.

It is assumed that each data file, which is written by a Cobol program whose source code we do not have, can have multiple record layouts. The aim of this tool is to perform data migration (EBCDIC to ASCII) by generating layout which would then be fed to a converter.

The problem is that there are hundreds of permutations and combinations that may arise with each byte. I thought that comparing the hex value of the corresponding byte in the record below the current one might give us some clue as to what this might be. But even in this case there is no concrete solution that one might arrive at. Decisions need to be taken at every juncture which might affect the end result.

Could someone please let me know for any said patterns that I can look for? For example, for all COMP-3s each nibble can possibly represent a value from 0-9 and hence the hex value of the byte might be something like, [0-9][0-9]. Essentially for data migration one need not bother about COMPs and COMP-3s as their value would not be affected in the migration. But identifying what is the DISPLAY fields are is also turning out to be a huge task. Can someone throw some ideas or point me in some direction that I can further explore?

Any help would be highly appreciated. I am really stuck in a mire here.

Thanks, Aditya.

回答1:

There are many enterprise transformation tools that will do exactly what you need. Alternatively, it is easy to parse the ADATA records from the compiled copybooks to get the exact byte positions and representations of every field.

Can I hazard a guess? Do you have nobody skilled in Cobol? It isn't that hard to process Cobol copybooks, certainly not as hard as it is to use a write only language like Perl.

Do you have syncsort or DFsort available? It will do what you ask with a simple config file...

回答2:

I guess you have to go with probabilities, and hope the data is varied enough to get a lot out of that.

Any field that only contains EBCDIC values of alpha-numeric plus punctuation
Numeric DISPLAY fields will be the easiest, containing just EBCDIC 0-9. Note that if signed then the first number will be changed to a letter, like A is -1 I think.
Pretty random distribution of values, leading with hex 0's, will likely be binary numeric "COMP" fields.
COMP-3 fields are one decimal digit in each hex digit of data. So if all the hex digits happen to be 0-9, that's a strong sign of a comp-3 field. Except the last hex digit of the field, which will contain a C for positive, D for negative, and F for unsigned.
Some programs use spaces on numeric fields, so if a field contains all sorts of binary, and also hex 40 (spaces), it's probably best to toss the hex 40 out of the mix. It might tell you a group of bytes is one field if they are all spaces together, or all data together.

As for multiple layouts, that's tough. A common convention for records that can have multiple layouts is to have a limited set of values for "what type of data is this" near the front of the record. Like significantID, recordType, data. So the significantID should increase steadily, while the recordType fields will vary between just a few values and re-cycle.

回答3:

The FileWizard in RecordEditor / JRecord can search for Mainframe Cobol fields in a Files. The FileWizard results can be stored in a Xml file for use in other languages or you can use the copy Function to copy from Ebcdic to either Ascii fixed or CSV formats.

There is some out of date documentation on the File Wizard

来源：https://stackoverflow.com/questions/7352034/generating-record-layouts-for-ebcdic-data-files

标签

migration

cobol