really need some help and advice as I\'m new with real time image processing.
I am trying to implement an algorithm for a system which the camera get 1000fps, and I
As you know, OpenCV implements several of its features in the GPU as well using the CUDA framework.
You can write your own CUDA code/functions to operate on the data and convert it to the OpenCV format without any problems. I demonstrate how to do this on cuda-grayscale. I guess this example answers most of your questions.
Note that OpenCV 2.3.1 uses CUDA 4.0, and OpenCV 2.4 only works with CUDA 4.1.
Regarding this statement:
I want to make sure while I am getting 1000fps and and pass them to GPU for processing
It's most likely that you won't be able to process the frames as fast as they come from the camera. If you don't want to drop any frames, you can forget about real-time (I'm assuming you are not working with incredibly small images (10x15)).
If you really need to work with 1000 FPS you'll have to implement a buffering mechanism to store the frames that comes from the device. And this is where we start talking about the implementation of a multithreaded system: the main thread of your application will be responsible to grab the frames from the camera and store them in a buffer, and the 2nd thread will read from the buffer and perform the processing on the frames.
For information on how to implement the buffering mechanism, check:
How to implement a circular buffer of cv::Mat objects (OpenCV)?
Thread safe implementation of circular buffer
C + OpenCV: IplImage with circular buffer