问题
I have a cobol "tape format" dump which has a mixture of text and number fields. I'm reading the file in C# as a binary array (array of byte). I have the copy book and the formats are lining up fine on the text fields. There are a number of COMP-3 fields as well. The data in those fields doesnt seem to match any BCD format. I know what the data should be and I have the raw bytes of the COMP-3. I tried converting to EBCDIC first which yielded no better results. Any thoughts on how a COMP-3 number can be otherwise internally stored? Below are three examples of the PIC, the raw data and the expected number. I know I have the field positions correct because there is alpha data on either side of the numbers and that all lines up correctly.
First Example: The PIC of the field is 9(9) COMP-3 There are 5 bytes to the data, the hex values are 02 01 20 91 22 The resulting data should be a date (00CCYYMMDD). This particular date should be 3-17-14.
Second Example: The PIC of the field is S9(3) COMP-3 There are 2 bytes to the data, the hex values are 0A 14 The resulting value should be between 900 and 999 My understanding is that the "S" means that the last nibble should be 0xC or 0xD to indicate + or -
Third Example: The PIC of the field is S9(15)V99 COMP-3 There are 9 bytes to the data, the hex values are 00 00 00 00 00 00 01 80 0C The resulting value should be 12.00
Ok so thanks to the people who responded as they pointed me in the right direction. This is indeed an ASCII/EBCDIC representation issue. The BCD is stored in EBCDIC. Using an ASCII to EBCDIC conversion table yields properly formatted BCD digits:
I used this link to map the data: http://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php
My data: 0A 14 Converted: 25 3C (turns out that 253 is a valid value, spec was wrong) C = +, all good
My data: 01 80 0C (excluding leading zeros) Converted: 01 20 0C 12.00 C = +, implied 2 digits in format, all good
My data: 02 01 20 91 22 Converted: 02 01 40 31 7F 2014/03/17 (F is unused nibble), all good
回答1:
There is no such thing as a COBOL "tape format"
although the phrase may mean something to the person who gave you the data.
The clue to your problem is that you can read the text. Connect that to the EBCDIC tag and your reference to C#.
So, you are reading data which is originally source from a Mainframe, most likely an IBM Mainframe, which uses EBCDIC instead of ASCII.
COBOL does not have native support for BCD.
What some kind soul has done for you is "convert" the data from EBCDIC to ASCII. Otherwise you wouldn't even recognise the "text".
Unfortunately, what that means for any binary or packed-decimal or floating-point fields (you won't see much of the last, but they are COMP-1/COMP-2) is that "convert" means "potentially scrambled", because the coversion is assuming individual bytes, with simple byte values, whereas all of those fields have conventional coding, either through multiple bytes or non-EBCDIC values or both.
So: COMP-3 PIC 9(9). As you say, five bytes. It is unsigned, so the rightmost nybble will be F (all bits on). You are slightly out with your positions due to the sign position being occupied, even for an unsigned field.
On the Mainframe, it contains a value X'020140317F'
. Only that field in its entirety can make any sense as to its value. However, the EBCDIC to ASCII conversion has made it X'0201209122'.
How?
Look up the EBCDIC value of X'02'
and X'01'
. They don't change. Look up the value of X'40'
, whoops, that's a space, change it to ASCII X'20'
. Look up the value of X'31'
. Actually nothing special there, and it has converted to something higher than X'7F'
, but if you look at the translation table used, I guess you'll see why it happens. The X'7F'
is a double-quote, so gets changed to X'22'
.
The other values you show suffer the same problem.
You should only ever take data from a Mainframe in character-only format. There are many answers here on this, you should look at the related
to the right.
Have a look at this recent question: Convert COMP and COMP-3 Packed Decimal into readable value with C
回答2:
OK, let's have a look at your first example. Given the format and value the original BCD-content should have been something like
02 01 40 31 7F
When transforming that from EBCDIC to ASCII we run into trouble with the first, second and fourth byte because they are control-characters - so here we would need some more details on how the ASCII->EBCDIC-converter worked. Looking at the two remaining bytes those would be changed
EBCDIC ASCII CHARACTER
40 -> 20 (blank)
7F -> 22 "
So assuming the first two bytes remain unchanged and the third gets converted like 31->91
we end up with
02 01 20 91 22
which is what you got. So it looks like some kind of EBCDIC->ASCII-conversion took place. If that is the case it might be that you can't repair the data since the transformation may not be one-one and thus not reversible.
Looking at the second example and using
EBCDIC ASCII CHARACTER
25 -> 0A (LF)
3C -> 14 (DC4)
you would have started with 25 3C
which would fit the format but not the range you gave.
In the third example the original 01 20 0C
could be converted to 01 80 0C
since 20
also is an EBCDIC control-character with no direct ASCII-equivalent.
But given all other examples I would assume there is some codepage-conversion issue. If you used some kind of filetransfer to move the data fromm the (supposed) mainframe make sure it is set to binary-mode and don't do any character-conversion before you split the file into fields and know what's meant to be a character and what not.
EDIT: You can find a list of several EBCDIC and ASCII-based codepages here or look here for the same as one pdf.
回答3:
I'm coming to this a bit late, but have a couple of suggestions that might make your life easier...
First, see if you can get your mainframe conterparts to convert all non-character (i.e. binary numeric and packed decimal) data to display format (e.g. PIC X) before you download it. Then you only need to deal with the "printable" range of numeric characters representing 0 through 9. Printable character only code-page conversions are fairly standaard and tend not to screw up as much. Reformatting data given a copybook is not a difficult prospect for anybody proficient in a mainframe environment. Unfortunately, sometimes you get the "runaround" and a claim is made that it is extremely costly or, takes special software, or any one of a hundred other bogus excuses.
If you get the "runaround" then the next best thing is to is download the file in binary format and do your own codepage conversion for the character data (fairly straight forward). Next deal with the binary data based on your copybook definitions. With a few Googles you should be able to find enough information to get through converting the PACKED-DECIMAL (COMP-3) data to whatever you need.
Here are a couple of links to get you started:
Numeric Data Formats
Packed Decimal
I do not recommend trying to reverse engineer the code page conversions applied by your file transfer package in order to decode the packed decimal and other binary data.
回答4:
Ok so thanks to both people who responded as they pointed me in the right direction. This is indeed an ASCII/EBCDIC representation issue. The BCD is stored in EBCDIC. Using an ASCII to EBCDIC conversion table yields properly formatted BCD digits:
I used this link to map the data: http://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php
My data: 0A 14
Converted: 25 3C (turns out that 253 is a valid value, spec was wrong) C = +, all good
My data: 01 80 0C (excluding leading zeros)
Converted: 01 20 0C 12.00 C = +, implied 2 digits in format, all good
My data: 02 01 20 91 22
Converted: 02 01 40 31 7F 2014/03/17 (F is unused nibble), all good
Thanks again for the two above answers which led me in the right direction.
回答5:
You can avoid the above issues by having the data converted into a modern method for transferring data: XML.
来源:https://stackoverflow.com/questions/22812755/cobol-comp-3-number-format-issue