how to compile opencl project with kernels

前端 未结 2 1084
自闭症患者
自闭症患者 2021-02-01 20:32

I am totally a beginner on opencl, I searched around the internet and found some \"helloworld\" demos for opencl project. Usually in such sort of minimal project, there is a *.c

相关标签:
2条回答
  • 2021-02-01 20:40

    In OpenCL, the .cl files that contain device kernel codes are usually being compiled and built at run-time. It means somewhere in your host OpenCL program, you'll have to compile and build your device program to be able to use it. This feature enables maximum portability.

    Let's consider an example I collected from two books. Below is a very simple OpenCL kernel adding two numbers from two global arrays and saving them in another global array. I save this code in a file named vector_add_kernel.cl.

    kernel void vecadd( global int* A, global int* B, global int* C ) {
        const int idx = get_global_id(0);
        C[idx] = A[idx] + B[idx];
    }
    

    Below is the host code written in C++ that exploits OpenCL C++ API. I save it in a file named ocl_vector_addition.cpp beside where I saved my .cl file.

    #include <iostream>
    #include <fstream>
    #include <string>
    #include <memory>
    #include <stdlib.h>
    
    #define __CL_ENABLE_EXCEPTIONS
    #if defined(__APPLE__) || defined(__MACOSX)
    #include <OpenCL/cl.cpp>
    #else
    #include <CL/cl.hpp>
    #endif
    
    int main( int argc, char** argv ) {
    
        const int N_ELEMENTS=1024*1024;
        unsigned int platform_id=0, device_id=0;
    
        try{
            std::unique_ptr<int[]> A(new int[N_ELEMENTS]); // Or you can use simple dynamic arrays like: int* A = new int[N_ELEMENTS];
            std::unique_ptr<int[]> B(new int[N_ELEMENTS]);
            std::unique_ptr<int[]> C(new int[N_ELEMENTS]);
    
            for( int i = 0; i < N_ELEMENTS; ++i ) {
                A[i] = i;
                B[i] = i;
            }
    
            // Query for platforms
            std::vector<cl::Platform> platforms;
            cl::Platform::get(&platforms);
    
            // Get a list of devices on this platform
            std::vector<cl::Device> devices;
            platforms[platform_id].getDevices(CL_DEVICE_TYPE_GPU|CL_DEVICE_TYPE_CPU, &devices); // Select the platform.
    
            // Create a context
            cl::Context context(devices);
    
            // Create a command queue
            cl::CommandQueue queue = cl::CommandQueue( context, devices[device_id] );   // Select the device.
    
            // Create the memory buffers
            cl::Buffer bufferA=cl::Buffer(context, CL_MEM_READ_ONLY, N_ELEMENTS * sizeof(int));
            cl::Buffer bufferB=cl::Buffer(context, CL_MEM_READ_ONLY, N_ELEMENTS * sizeof(int));
            cl::Buffer bufferC=cl::Buffer(context, CL_MEM_WRITE_ONLY, N_ELEMENTS * sizeof(int));
    
            // Copy the input data to the input buffers using the command queue.
            queue.enqueueWriteBuffer( bufferA, CL_FALSE, 0, N_ELEMENTS * sizeof(int), A.get() );
            queue.enqueueWriteBuffer( bufferB, CL_FALSE, 0, N_ELEMENTS * sizeof(int), B.get() );
    
            // Read the program source
            std::ifstream sourceFile("vector_add_kernel.cl");
            std::string sourceCode( std::istreambuf_iterator<char>(sourceFile), (std::istreambuf_iterator<char>()));
            cl::Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()));
    
            // Make program from the source code
            cl::Program program=cl::Program(context, source);
    
            // Build the program for the devices
            program.build(devices);
    
            // Make kernel
            cl::Kernel vecadd_kernel(program, "vecadd");
    
            // Set the kernel arguments
            vecadd_kernel.setArg( 0, bufferA );
            vecadd_kernel.setArg( 1, bufferB );
            vecadd_kernel.setArg( 2, bufferC );
    
            // Execute the kernel
            cl::NDRange global( N_ELEMENTS );
            cl::NDRange local( 256 );
            queue.enqueueNDRangeKernel( vecadd_kernel, cl::NullRange, global, local );
    
            // Copy the output data back to the host
            queue.enqueueReadBuffer( bufferC, CL_TRUE, 0, N_ELEMENTS * sizeof(int), C.get() );
    
            // Verify the result
            bool result=true;
            for (int i=0; i<N_ELEMENTS; i ++)
                if (C[i] !=A[i]+B[i]) {
                    result=false;
                    break;
                }
            if (result)
                std::cout<< "Success!\n";
            else
                std::cout<< "Failed!\n";
    
        }
        catch(cl::Error err) {
            std::cout << "Error: " << err.what() << "(" << err.err() << ")" << std::endl;
            return( EXIT_FAILURE );
        }
    
        std::cout << "Done.\n";
        return( EXIT_SUCCESS );
    }
    

    I compile this code on a machine with Ubuntu 12.04 like this:

    g++ ocl_vector_addition.cpp -lOpenCL -std=c++11 -o ocl_vector_addition.o
    

    It produces a ocl_vector_addition.o, which when I run, shows successful output. If you look at the compilation command, you see we have not passed anything about our .cl file. We only have used -lOpenCL flag to enable OpenCL library for our program. Also, don't get distracted by -std=c++11 command. Because I used std::unique_ptr in the host code, I had to use this flag for a successful compile.

    So where is this .cl file being used? If you look at the host code, you'll find four parts that I repeat in below numbered:

    // 1. Read the program source
    std::ifstream sourceFile("vector_add_kernel.cl");
    std::string sourceCode( std::istreambuf_iterator<char>(sourceFile), (std::istreambuf_iterator<char>()));
    cl::Program::Sources source(1, std::make_pair(sourceCode.c_str(), sourceCode.length()));
    
    // 2. Make program from the source code
    cl::Program program=cl::Program(context, source);
    
    // 3. Build the program for the devices
    program.build(devices);
    
    // 4. Make kernel
    cl::Kernel vecadd_kernel(program, "vecadd");
    

    In the 1st step, we read the content of the file that holds our device code and put it into a std::string named sourceCode. Then we make a pair of the string and its length and save it to source which has the type cl::Program::Sources. After we prepared the code, we make a cl::program object named program for the context and load the source code into the program object. The 3rd step is the one in which the OpenCL code gets compiled (and linked) for the device. Since the device code is built in the 3rd step, we can create a kernel object named vecadd_kernel and associate the kernel named vecadd inside it with our cl::kernel object. This was pretty much the set of steps involved in compiling a .cl file in a program.

    The program I showed and explained about creates the device program from the kernel source code. Another option is to use binaries instead. Using binary program enhances application loading time and allows binary distribution of the program but limits portability since binaries that work fine on one device may not work on another device. Creating program using source code and binary are also called offline and online compilation respectively (more information here). I skip it here since the answer is already too long.

    0 讨论(0)
  • 2021-02-01 20:54

    My answer comes four years late. Nevertheless, I have something to add that complements @Farzad's answer, as follows.

    Confusingly, in OpenCL practice, the verb to compile is used to mean two different, incompatible things:

    • In one usage, to compile means what you already think that it means. It means to build at build-time, as from *.c sources to produce *.o objects for build-time linking.
    • However, in another usage—and this other usage may be unfamiliar to you—to compile means to interpret at run time, as from *.cl sources, producing GPU machine code.

    One happens at build-time. The other happens at run-time.

    It might have been less confusing had two different verbs been introduced, but that is not how the terminology has evolved. Conventionally, the verb to compile is used for both.

    If unsure, then try this experiment: rename your *.cl file so that your other source files cannot find it, then build.

    See? It builds fine, doesn't it?

    This is because the *.cl file is not consulted at build time. Only later, when you try to execute the binary executable, does the program fail.

    If it helps, you can think of the *.cl file as though it were a data file or a configuration file or even a script. It isn't literally a data file, a configuration file or a script, perhaps, for it does eventually get compiled to a kind of machine code, but the machine code is GPU code and it is not made from the *.cl program text until run-time. Moreover, at run-time, your C compiler as such is not involved. Rather, it is your OpenCL library that does the building.

    It took me a fairly long time to straighten these concepts in my mind, mostly because—like you—I had long been familiar with the stages of the C/C++ build cycle; and, therefore, I had thought that I knew what words like to compile meant. Once your mind has the words and concepts straight, the various OpenCL documentation begins to make sense, and you can start work.

    0 讨论(0)
提交回复
热议问题