问题
I'm trying to use allocatable arrays inside "device" data structures that reside in GPU memory. Code (pasted below) compiles, but gives a segfault. Am I doing something obviously wrong?
Module file is called 'gpu_modules.F90', given below:
!=============
! This module contains definitions for data structures and the data
! stored on the device
!=============
module GPU_variables
use cudafor
type :: data_str_def
!=============
! single number quantities
!=============
integer :: i, j
real(kind=8) :: a
!=============
! Arrays
!=============
real(kind=8), allocatable :: b(:)
real(kind=8), allocatable :: c(:,:)
real(kind=8), allocatable :: d(:,:,:)
real(kind=8), allocatable :: e(:,:,:,:)
end type data_str_def
!=============
! Actual data is here
!=============
type(data_str_def), device, allocatable :: data_str(:)
contains
!=============
! subroutine to allocate memory
!=============
subroutine allocate_mem(n1)
implicit none
integer, intent(in) :: n1
call deallocate_mem()
write(*,*) 'works here'
allocate(data_str(n1))
write(*,*) 'what about allocating memory?'
allocate(data_str(n1) % b(10))
write(*,*) 'success!'
return
end subroutine allocate_mem
!=============
! subroutine to deallocate memory
!=============
subroutine deallocate_mem()
implicit none
if(allocated(data_str)) deallocate(data_str)
return
end subroutine deallocate_mem
end module GPU_variables
Main program is 'gpu_test.F90', given below:
!=============
! main program
!=============
program gpu_test
use gpu_variables
implicit none
!=============
! local variables
!=============
integer :: i, j, n
!=============
! allocate data
!=============
n = 2 ! number of data structures
call allocate_mem(n)
!=============
! dallocate device data structures and exit
!=============
call deallocate_mem()
end program
Compilation command (from current folder) is:
pgfortran -Mcuda=cc5x *.F90
Terminal output:
$ ./a.out
works here
what about allocating memory?
Segmentation fault (core dumped)
Any help/insight and solution would be appreciated.. and no, use of pointers is not a viable option.
Edit: another detail that may be relevant: I'm using pgfortran version 16.10
回答1:
The reason for the segmentation fault is that you have to access the memory for data_str on the host in order to allocate data_str(n1)%b. Since data_str is in device memory, not host memory, you get the segmentation fault. In theory, the compiler could create a host temp, allocate it, and then copy it to descriptor for data_str(n1)%b, but that's not part of today's CUDA Fortran.
You can work around this case by creating the temp yourself:
subroutine allocate_mem(n1)
implicit none
integer, intent(in) :: n1
type(data_str_def) :: data_str_h
call deallocate_mem()
write(*,*) 'works here'
allocate(data_str(n1))
write(*,*) 'what about allocating memory?'
allocate(data_str_h% b(10))
data_str(n1) = data_str_h
write(*,*) 'success!'
return
end subroutine allocate_mem
BTW, is your intention that components b, c, d, and e are allocated in host memory or device memory? I don't see the device attribute on them, so in the above, they'd go to host memory.
回答2:
So I posted this question on the PGI forums, and a guy from PGI confirms that the feature is not supported as I'm trying to use it.
http://www.pgroup.com/userforum/viewtopic.php?t=5661
His recommendation was to use the "managed" attribute or use fixed-sized arrays inside the data structure.
来源:https://stackoverflow.com/questions/45233207/allocatable-arrays-in-cuda-fortran-device-data-structures