I\'m testing the performance of cudaMalloc, and find its latency is not stable. Here is a small test:
cudaMalloc
int main(int argc, char** argv) {