I have a PCIe device with a userspace driver. I\'m writing commands to the device through a BAR, the commands are latency sensitive and amount of data is small (~64-bytes) so I
I don't know if this will help, but this is how I got write-combining working on PCIe. Granted, it was in kernel space, but this complies with the Intel documentation. It's worth trying if you're stuck.
Globally defined:
unsigned int __attribute__ ((aligned(0x20))) srcArr[ARR_SIZE];
In your function:
int *pDestAddr
for (i = 0; i < ARR_SIZE; i++) {
_mm_stream_si32(pDestAddr + i, pSrcAddr[i]);
}