I am working on simulating custom PCI device in QEMU environment. Performance is critical in my scenario and I need to reduce run time as much as I can (Currently my custom benc