C#, VS 2010
I need to determine if a float value is NaN.
Testing a float for NaN using
float.IsNaN(aFloatNumber)
crashes
works OK in the constructor of theclass for a custom control
This is the only real hint towards the underlying problem. Code that runs on a thread can manipulate two stacks inside the processor. One is the normal one that everybody knows about and gave this web site its name. There is however another one, well hidden inside the FPU (Floating Point Unit). It stores intermediate operand values while making floating point calculations. It is 8 levels deep.
Any kind of mishap inside the FPU is not supposed to generate runtime exceptions. The CLR assumes that the FPU is configured with its defaults for the FPU control word, the hardware exceptions it can generate are supposed to be disabled.
That does have a knack for going wrong when your program uses code that came from the 1990s, back when enabling FPU exceptions still sounded like a good idea. Code generated by Borland tooling are notorious for doing this for example. Its C runtime module reprograms the FPU control word and unmasks the hardware exceptions. The kind of exceptions you can get for that can be very mysterious, using NaN in your code is a good way to trigger such an exception.
This should be at least partially visible with the debugger. Set a breakpoint on the "still good" code and use the Debug + Windows + Registers debugger window. Right-click it and select "Floating point". You'll see all of the registers that are involved with floating point calculations, ST0 through ST7 are the stack registers for example. The important one here is marked CTRL
, its normal value in a .NET process is 027F
. The last 6 bits in that value are the exception masking bits (0x3F), all turned on to prevent hardware exceptions.
Single step through the code and the expectation is that you see the CTRL value change. As soon as it does then you'll have found the evil code. If you enable unmanaged debugging then you should also see the load notification in the Output window and see it appear in the Debug + Windows + Module window.
Undoing the damage that the DLL did is fairly awkward. You'd have to pinvoke _control87() in msvcrt.dll for example to restore the CTRL word. Or a simple trick that you can use, you can intentionally throw an exception. The exception handling logic inside the CLR resets the FPU control word. So with some luck, this kind of code is going to solve your problem:
InitializeComponent();
try { throw new Exception("Please ignore, resetting FPU"); }
catch {}
You may have to move it, next best guess is the Load event. The debugger should tell you where.
The solution:
First of all, thank you to @Matt for pointing me in the right direction, and @Hans Passant for providing the workaround.
The application talks to a CAN-USB adapter from Chinese manufacturer QM_CAN.
The problem is in their driver.
The DLL statements and Driver import:
// DLL Statement
IntPtr QM_DLL;
TYPE_Init_can Init_can;
TYPE_Quit_can Quit_can;
TYPE_Can_send Can_send;
TYPE_Can_receive Can_receive;
delegate int TYPE_Init_can(byte com_NUM, byte Model, int CanBaudRate, byte SET_ID_TYPE, byte FILTER_MODE, byte[] RXF, byte[] RXM);
delegate int TYPE_Quit_can();
delegate int TYPE_Can_send(byte[] IDbuff, byte[] Databuff, byte FreamType, byte Bytes);
delegate int TYPE_Can_receive(byte[] IDbuff, byte[] Databuff, byte[] FreamType, byte[] Bytes);
// Driver
[DllImport("kernel32.dll")]
static extern IntPtr LoadLibrary(string lpFileName);
[DllImport("kernel32.dll")]
static extern IntPtr GetProcAddress(IntPtr hModule, string lpProcName);
The call to the offending code, including Hans' workaround:
private void InitCanUsbDLL() // Initiate the driver for the CAN-USB dongle
{
// Here is an example of dynamically loaded DLL functions
QM_DLL = LoadLibrary("QM_USB.dll");
if (QM_DLL != IntPtr.Zero)
{
IntPtr P_Init_can = GetProcAddress(QM_DLL, "Init_can");
IntPtr P_Quit_can = GetProcAddress(QM_DLL, "Quit_can");
IntPtr P_Can_send = GetProcAddress(QM_DLL, "Can_send");
IntPtr P_Can_receive = GetProcAddress(QM_DLL, "Can_receive");
// The next line results in a FPU stack overflow if float.NaN is called by a handler
Init_can = (TYPE_Init_can)Marshal.GetDelegateForFunctionPointer(P_Init_can, typeof(TYPE_Init_can));
// Reset the FPU, otherwise we get a stack overflow when we work with float.NaN within a event handler
// Thanks to Matt for pointing me in the right direction and to Hans Passant for this workaround:
// http://stackoverflow.com/questions/25205112/testing-for-a-float-nan-results-in-a-stack-overflow/25206025
try { throw new Exception("Please ignore, resetting FPU"); }
catch { }
Quit_can = (TYPE_Quit_can)Marshal.GetDelegateForFunctionPointer(P_Quit_can, typeof(TYPE_Quit_can));
Can_send = (TYPE_Can_send)Marshal.GetDelegateForFunctionPointer(P_Can_send, typeof(TYPE_Can_send));
Can_receive = (TYPE_Can_receive)Marshal.GetDelegateForFunctionPointer(P_Can_receive, typeof(TYPE_Can_receive));
}
}
The reason that the application crashed when a reference was made to float.NaN in the event handler and not in the constructor was a simple matter of timing: the constructor is called before InitCanUsbDLL(), but the event handler was called long after InitCanUsbDLL() corrupted the FPU registers.
I just wrote an example to reproduce the error: 1. Create a native C/C++ DLL which exports this function:
extern "C" __declspec(dllexport) int SetfloatingControlWord(void)
{
//unmask all the floating excetpions
int err = _controlfp_s(NULL, 0, _MCW_EM);
return err;
}
2. Create a C# console program, which call the function SetfloatingControlWord, after that, do some floating operation such as NaN compare, then it leads to stack overflow.
[DllImport("floatcontrol.dll")]
public static extern Int32 SetfloatingControlWord();
static void Main(string[] args)
{
int n = SetfloatingControlWord();
float fff = 0F;
int iii = fff.CompareTo(float.NaN);
}
I encountered the same problem years ago, also, I noticed that after an .NET exception throws, everything works fine, it took me a while to figure out why and trace the code which changed the FPU.
As the doc of function _controlfp_s says: By default, the run-time libraries mask all floating-point exceptions. The common language runtime (CLR) only supports the default floating-point precision, so CLR doesn't handle these kind exceptions.
As MSDN says:By default, the system has all FP exceptions turned off. Therefore, computations result in NAN or INFINITY, rather than an exception.
After NaN was introduced in IEEE 754 1985, it suppose that application software no longer need to handle the floating point exceptions.