I am working with some old data imports and came across a bunch of data from an external source that reports financial numbers with a signed overpunch. I\'ve seen alot, but
Presumably in the specification for the file or your program you are told how to deal with this? No?
As Bruce Martin has said, a true Overpunch goes back to the days of punched-cards. You punched the final digit of a number, then re-punched (overpunched) the same position on the card.
The link to the Wiki that you included in your question is fine for that. But I'm pretty sure the source of your data is not punched-cards.
Although part of this answer presumes you are using a Mainframe, the solution proposed is machine-independent.
The source of your data is a Mainframe? We don't know, although it is important information. For the moment, let's assume it is so.
Unless it is very old data which is unchanging, it has been processed on the Mainframe in the last 20 years. Unless the compiler used (assuming it has come from a COBOL program) is very, very old, then you need to know the setting of compiler option NUMPROC
. Here's why: http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/igy3pg50/2.4.36?DT=20090820210412
Default is: NUMPROC(NOPFD)
Abbreviations are: None
The compiler accepts any valid sign configuration: X'A', X'B', X'C', X'D', X'E', or X'F'. NUMPROC(NOPFD) is the recommended option in most cases.
NUMPROC(PFD) improves the performance of processing numeric internal decimal and zoned decimal data. Use this option only if your program data agrees exactly with the following IBM system standards:
Zoned decimal, unsigned: High-order 4 bits of the sign byte contain X'F'.
Zoned decimal, signed overpunch: High-order 4 bits of the sign byte contain X'C' if the number is positive or 0, and X'D' if it is not.
Zoned decimal, separate sign: Separate sign contains the character '+' if the number is positive or 0, and '-' if it is not.
Internal decimal, unsigned: Low-order 4 bits of the low-order byte contain X'F'.
Internal decimal, signed: Low-order 4 bits of the low-order byte contain X'C' if the number is positive or 0, and X'D' if it is not.
Data produced by COBOL arithmetic statements conforms to the above IBM system standards. However, using REDEFINES and group moves could change data so that it no longer conforms. If you use NUMPROC(PFD), use the INITIALIZE statement to initialize data fields, rather than using group moves.
Using NUMPROC(PFD) can affect class tests for numeric data. You should use NUMPROC(NOPFD) or NUMPROC(MIG) if a COBOL program calls programs written in PL/I or FORTRAN.
Sign representation is affected not only by the NUMPROC option, but also by the installation-time option NUMCLS.
Use NUMPROC(MIG) to aid in migrating OS/VS COBOL programs to Enterprise COBOL. When NUMPROC(MIG) is in effect, the following processing occurs:
Preferred signs are created only on the output of MOVE statements and arithmetic operations. No explicit sign repair is done on input. Some implicit sign repair might occur during conversion. Numeric comparisons are performed by a decimal comparison, not a logical comparison.
What does that mean to you? If NUMPROC(NOPFD) is being used, you may see X'A' through X'F' in the high-order nybble of the final byte of the field. If NUMPROC(PFD) is being used you shouldn't see anything other that X'C' or X'D' in that position.
Note that if the file you are receiving has been generated by the installed Mainframe SORT product, you have the same potential issue.
may and shouldn't are not good things to see in a specification.
Is your data remotely business-critical in a financial environment? Then you almost certainly have issues of audit and compliance. It runs something like this:
Auditor, "What do you do with the data when you receive it?"
You, "The first thing I do is change it"
Auditor, "Really? How do you verify the data once you have changed it?"
You, "Errr..."
You might get lucky and never have an auditor look at it.
All those non-deterministic words aren't very good for programming.
So how do you get around it?
There should be no fields on the data that you receive which have embedded signs. There should be no numeric fields which are not represented as character data (no binary, packed, or floating-point). If a field is signed, the sign should be presented separately. If a field has decimal places, an actual .
or ,
(depending on home-country of the system) should be provided, or as an alternative a separate field with a scaling-factor.
Is this difficult for your Mainframe people to do? Not remotely. Insist on it. If they will not do it, document it such that problems arising are not yours, but theirs.
If all numeric data presented to you is plain character data (plus, minus, comma, digits 0 to 9) then you will have absolutely no problem in understanding the data, whether it is any variant of EBCDIC or any variant of ASCII.
Be aware that any fields with decimal-places coming from COBOL are exact decimal amounts. Do not store/use them in anything other than fields in your language which can processes exact decimal amounts.
You don't provide any sample data. So here's a sample:
123456{
This should be represented to yous as:
+1234560
If it has two decimal places:
+12345.60
or
+12345602 (where the trailing 2 is a scaling-factor, which you validate)
If numeric data is to be transferred from external systems, it should always be done in character format. It will make everything so much easier to code, understand, maintain, and audit.