Unresolved references using IFORT with nvcc and CUSP

前端未结

关注

 1  720

I have a program which I\'m compiling like this:

(...) Some ifort *.f -c
nvcc -c src/bicgstab.cu -o bicgstab.o -I/home/ricardo/apps/cusp/cusplibrary
(...) Some m


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-01-27 09:55
              
            
            
                                                                       
ipo is fairly complicated when there are multiple files involved.   It's actually rerunning the compiler on all modules at link time.  I'm not an expert on this, but that sounds like something fairly difficult to wade through.

One possible option might be that you try to compile your cuda code into a shared library (.so) and link against that.  It should prevent the intel compiler toolchain from trying to recompile and optimize against the code generated by nvcc/gcc.  I think this is going to limit you to "single file optimizations".  Don't know if that will significantly affect your performance or not.

Using my example here, I would modify the compile commands as follows:

$ nvcc -Xcompiler="-fPIC" -shared bicgstab.cu -o bicgstab.so -I/home-2/robertc/misc/cusp/cusplibrary-master
$ ifort -c -fast bic.f90
$ ifort bic.o bicgstab.so -L/shared/apps/cuda/CUDA-v6.0.37/lib64 -lcudart  -o program
ipo: remark #11001: performing single-file optimizations
ipo: remark #11006: generating object file /tmp/ipo_ifortxEdpin.o
$


You don't indicate where in your compile process you are adding the -fast switch(es).  If only on the ifort compile commands, I believe the above approach will work.  If you also want/need it on the link command, then it appears that ifort wants to build an entirely statically linked executable (and do intermodule optimization...), which won't work using the above process. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复