问题
I have been tasked with writing a Fortran 95 program that will read character input from a file, and then (to start with) simply spit it back out again. The tricky part is that these lines of input are of varying length (no maximum length given) and there can be any number of lines within the file.
I've used
do
read( 1, *, iostat = IO ) DNA ! reads to EOF -- GOOD!!
if ( IO < 0 ) exit ! if EOF is reached, exit do
I = I + 1
NumRec = I ! used later for total no. of records
allocate( Seq(I) )
Seq(I) = DNA
print*, I, Seq(I)
X = Len_Trim( Seq(I) ) ! length of individual sequence
print*, 'Sequence size: ', X
print*
end do
However, my initial statements list
character(100), dimension(:), allocatable :: Seq
character(100) DNA
and the appropriate integers etc.
I guess what I'm asking is if there is any way to NOT list the size of the character strings in the first instance. Say I've got a string of DNA that is 200+ characters, and then another that is only 25, is there a way that the program can just read what there is and not need to include all the additional blanks? Can this be done without needing to use len_trim
, since it can't be referenced in the declaration statements?
回答1:
To progressively read a record in Fortran 95, use non-advancing input. For example:
CHARACTER(10) :: buffer
INTEGER :: size
READ (unit, "(A)", ADVANCE='NO', SIZE=size, EOR=10, END=20) buffer
will read up to 10 characters worth (the length of buffer) each time it is called. The file position will only advance to the next record (the next line) once the entire record has been read by a series of one or more non-advancing reads.
Barring an end of file condition, the size
variable will be defined with the actual number of characters read into buffer
each time the read statement is executed.
The EOR
and END
and specifiers are used to control execution flow (execution will jump to the appropriately labelled statement) when end of record or end of file conditions occur respectively. You can also use an IOSTAT
specifier to detect these conditions, but the particular negative values to use for the two conditions are processor dependent.
You can sum size
within a particular record to work out the length of that particular record.
Wrap such a non-advancing read in a loop that appropriately detects for end of file and end of record and you have the incremental reading part.
In Fortran 95, the length specification for a local character variable must be a specification expression - essentially an expression that can be safely evaluated prior to the first executable statement of the scope that contains the variable's declaration. Constants represent the simplest case, but a specification expression in a procedure can involve dummy arguments of that procedure, amongst other things.
Reading the entire record of arbitrary length in is then a multi stage process:
- Determine the length of the current record by using a series of incremental reads. These incremental reads for a particular record finish when the end of record condition occurs, at which time the file position will have moved to the next record.
Backspace
the file back to the record of interest.- Call a procedure, passing the length of the current record as a dummy argument. Inside that procedure have an character variable whose length is given by the dummy argument.
- Inside that called procedure, read the current record into that character variable using normal advancing input.
- Carry out further processing on that character variable!
Note that each record ends up being read twice - once to determine its length, the second to actually read the data into the correctly "lengthed" character variable.
Alternative approaches exist that use allocatable (or automatic) character arrays of length one. The overall strategy is the same. Look at the code of the Get procedures in the common ISO_VARYING_STRING implementation for an example.
Fortran 2003 introduces deferred length character variables, which can have their length specified by an arbitrary expression in an allocate statement or, for allocatable variables, by the length of the right hand side in an assignment statement. This (in conjunction with other "allocatable" enhancements) allows the progressive read that determines the record length to also build the character variable that holds the contents of the record. Your supervisor needs to bring his Fortran environment up to date.
回答2:
Here's a function for Fortran 2003, which sets an allocatable string (InLine) of exactly the length of the input string (optionally trimmed), or returns .false. if end of file
function ReadLine(aunit, InLine, trimmed) result(OK)
integer, intent(IN) :: aunit
character(LEN=:), allocatable, optional :: InLine
logical, intent(in), optional :: trimmed
integer, parameter :: line_buf_len= 1024*4
character(LEN=line_buf_len) :: InS
logical :: OK, set
integer status, size
OK = .false.
set = .true.
do
read (aunit,'(a)',advance='NO',iostat=status, size=size) InS
OK = .not. IS_IOSTAT_END(status)
if (.not. OK) return
if (present(InLine)) then
if (set) then
InLine = InS(1:size)
set=.false.
else
InLine = InLine // InS(1:size)
end if
end if
if (IS_IOSTAT_EOR(status)) exit
end do
if (present(trimmed) .and. present(InLine)) then
if (trimmed) InLine = trim(adjustl(InLine))
end if
end function ReadLine
For example to do something with all lines in a file with unit "aunit" do
character(LEN=:), allocatable :: InLine
do while (ReadLine(aunit, InLine))
[.. something with InLine]
end do
回答3:
I have used the following. Let me know if it is better or worse than yours.
!::::::::::::::::::::: SUBROUTINE OR FUNCTION :::::::::::::::::::::::::::::::::::::::
!__________________ SUBROUTINE lineread(filno,cargout,ios) __________________________
subroutine lineread(filno,cargout,ios)
Use reallocate,ErrorMsg,SumStr1,ChCount
! this subroutine reads
! 1. following row in a file except a blank line or the line begins with a !#*
! 2. the part of the string until first !#*-sign is found or to end of string
!
! input Arguments:
! filno (integer) input file number
!
! output Arguments:
! cargout (character) output chArActer string, converted so that all unecessay spaces/tabs/control characters removed.
implicit none
integer,intent(in)::filno
character*(*),intent(out)::cargout
integer,intent(out)::ios
integer::nlen=0,i,ip,ich,isp,nsp,size
character*11,parameter::sep='=,;()[]{}*~'
character::ch,temp*100
character,pointer::crad(:)
nullify(crad)
cargout=''; nlen=0; isp=0; nsp=0; ich=-1; ios=0
Do While(ios/=-1) !The eof() isn't standard Fortran.
READ(filno,"(A)",ADVANCE='NO',SIZE=size,iostat=ios,ERR=9,END=9)ch ! start reading file
! read(filno,*,iostat=ios,err=9)ch;
if(size>0.and.ios>=0)then
ich=iachar(ch)
else
READ(filno,"(A)",ADVANCE='no',SIZE=size,iostat=ios,EOR=9); if(nlen>0)exit
end if
if(ich<=32)then ! tab(9) or space(32) character
if(nlen>0)then
if(isp==2)then
isp=0;
else
isp=1;
end if
eend if; cycle;
elseif(ich==33.or.ich==35.or.ich==38)then !if char is comment !# or continue sign &
READ(filno,"(A)",ADVANCE='yes',SIZE=size,iostat=ios,EOR=9)ch; if(nlen>0.and.ich/=38)exit;
else
ip=scan(ch,sep);
if(isp==1.and.ip==0)then; nlen=nlen+1; crad=>reallocate(crad,nlen); nsp=nsp+1; endif
nlen=nlen+1; crad=>reallocate(crad,nlen); crad(nlen)=ch;
isp=0; if(ip==1)isp=2;
end if
end do
9 if(size*ios>0)call ErrorMsg('Met error in reading file in [lineread]',-1)
! ios<0: Indicating an end-of-file or end-of-record condition occurred.
if(nlen==0)return
!write(6,'(a,l)')SumStr1(crad),eof(filno)
!do i=1,nlen-1; write(6,'(a,$)')crad(i:i); end do; if(nlen>0)write(6,'(a)')crad(i:i)
cargout=SumStr1(crad)
nsp=nsp+1; i=ChCount(SumStr1(crad),' ',',')+1;
if(len(cargout)<nlen)then
call ErrorMsg(SumStr1(crad)// " is too long!",-1)
!elseif(i/=nsp.and.nlen>=0)then
! call ErrorMsg(SumStr1(crad)// " has unrecognizable data number!",-1)
end if
end subroutine lineread
回答4:
I'm using Fortran 90 to do this:
X = Len_Trim( Seq(I) ) ! length of individual sequence
write(*,'(a<X>)') Seq(I)(1:X)
You can simply declare Seq to be a large character string and then trim it as your write it out. I don't know how kosher this solution is but it certainly works for my purpose. I know that some compilers do not support "variable format expressions", but there are various workarounds to do the same thing almost as simply.
GNU Fortran variable expression workaround.
来源:https://stackoverflow.com/questions/14765382/reading-a-character-string-of-unknown-length