CoArray Fortran example

Here, we want to calculate $$\pi$$ with a Monte-Carlo algorithm. Each image will compute its own Monte Carlo average, and the global average will be computed at the end.

The area inside a unit circle is $$\pi$$. The red square is the square containing all points with coordinates in the $$([0,1],[0,1])$$ range. The grey area represents the set of points that are in the $$([0,1],[0,1])$$ range and which are at a distance less than one to the center of the circle. For the computation of $$\pi$$ with a Monte Carlo algorithm, each sample will consist in drawing two uniform random numbers in the $$[0,1]$$ range (one for the $$x$$ coordinate and one for the $$y$$ coordinate). Is the distance of the point to the center is less than one, we increment a counter. Our estimate of $$\pi$$ will be $$4 N{\rm inside} / N{\rm total}$$.

Single core program

Let us first write a single core program. We write the providers in the pi.irp.f file. The N_steps entity defines the number of Monte-Carlo steps to compute the value of $$\pi$$ in a single process.

BEGIN_PROVIDER [ integer*8, N_steps ]
 implicit none
 BEGIN_DOC
 ! Total number of MC steps 
 END_DOC
 N_steps = 0000000_8
END_PROVIDER

The N_blocks entity is the total number of independent calculations of $$\pi$$ one will do in a single process.

BEGIN_PROVIDER [ integer, N_blocks ]
 implicit none
 BEGIN_DOC
 ! Total number of blocks, each containing N_steps steps.
 END_DOC
 N_blocks = 100
END_PROVIDER

One will need to initialize the seed of the Fortran random number generator:

subroutine init_seed(i)
  implicit none
  integer, intent(in) :: i
  BEGIN_DOC
! Initializes the random seed with the current this_image()
  END_DOC
  integer :: seed(12)
  seed(:) = i
  call random_seed(put=seed)
end

pi_block is the Monte-Carlo evaluation of $$\pi$$ over N_steps.

BEGIN_PROVIDER [ double precision, pi_block ]
 implicit none
 BEGIN_DOC
 ! Value of pi computed over N_steps
 END_DOC

 integer                        :: i_step
 integer*8                      :: count_in
 double precision               :: x,y

 count_in = 0_8

 do i_step=1,N_steps
   call random_number(x)
   call random_number(y)
   if ( (x*x + y*y) <= 1.d0) then
     count_in += 1_8
   endif
 enddo
 pi_block = 4.d0*dble(count_in)/dble(N_steps)

END_PROVIDER

Let us now write the main program in test_mono.irp.f. It will print the running average and error bar of $$\pi$$ on the standard output. At the end of each loop cycle, the pi_block entity is freed, such that it will be freshly provided at the beginning of the next loop iteration.

program test_mono
  implicit none
  BEGIN_DOC
! Test the single core program
  END_DOC
  integer          :: i
  double precision :: pi_sum, pi_sum2, n
  double precision :: pi_average, pi_variance, pi_error

  call init_seed(1)

  pi_sum  = 0.d0
  pi_sum2 = 0.d0
  n = 0.d0
  do i=1,N_blocks
    PROVIDE pi_block
    n += 1.d0
    pi_sum  += pi_block
    pi_sum2 += pi_block*pi_block
    pi_average = pi_sum / n
    pi_variance = pi_sum2/n - pi_average**2
    pi_error = sqrt(pi_variance/(n-1.d0))
    print *,  pi_average, '+/-', pi_error
    FREE pi_block
  enddo
end

The output of this program gives:

$ ./test_mono
   3.14213880000000      +/-                     NaN
   3.14194260000000      +/-  1.961999945451743E-004
   3.14176573333333      +/-  2.100316592883950E-004
   3.14156600000000      +/-  2.488976753433852E-004
   3.14144144000000      +/-  2.295326231094694E-004
...
   3.14160352500000      +/-  5.354283981217712E-005
   3.14160270103093      +/-  5.299438256043764E-005
   3.14159646938776      +/-  5.281972755126230E-005
   3.14160685252525      +/-  5.330451270701332E-005
   3.14161012000000      +/-  5.286984052818029E-005

Parallel program

Now, we will write the prallel version of the program. First we will add to the Makefile the --coarray option to irpf90 and the -coarray option to ifort.

We can now write the parallel main program. We use a temporary array that will fetch all the remote values of pi_block. After a synchronization directive (SYNC ALL), the master process can compute the running average and error bar, and print the result.

program test_caf
  implicit none
  BEGIN_DOC
! Test the single core program
  END_DOC

  integer          :: i,j
  double precision :: pi_sum, pi_sum2, n
  double precision :: pi_average, pi_variance, pi_error
  double precision, allocatable :: pi_block_local(:)

  allocate (pi_block_local(num_images()))

  call init_seed(11*this_image())

  pi_sum  = 0.d0
  pi_sum2 = 0.d0
  n = 0.d0
  do i=1,N_blocks
    PROVIDE pi_block
    do j=1,num_images()
      pi_block_local(j) = pi_block[j]
    enddo
    SYNC ALL
    if (this_image() == 1) then
      do j=1,num_images()
        n += 1.d0
        pi_sum  += pi_block_local(j)
        pi_sum2 += pi_block_local(j)*pi_block_local(j)
      enddo
      pi_average = pi_sum / n
      pi_variance = pi_sum2/n - pi_average**2
      pi_error = sqrt(pi_variance/(n-1.d0))
      print *,  pi_average, '+/-', pi_error
    endif
    FREE pi_block
  enddo

  deallocate(pi_block_local)
end

Using 4 processes, the output of the program gives:

$ ./test_caf
   3.14237790000000      +/-  4.966975835348314E-004
   3.14234140000000      +/-  2.348424884354168E-004
   3.14224363333333      +/-  1.958218077039208E-004
   3.14210150000000      +/-  1.786455096347535E-004
   3.14193388000000      +/-  1.646706423564424E-004
...
   3.14161792187500      +/-  2.869715284522186E-005
   3.14161840309278      +/-  2.842813192882745E-005
   3.14161785918367      +/-  2.815996517045344E-005
   3.14161942727273      +/-  2.794672232344559E-005
   3.14162559000000      +/-  2.785054179473510E-005

One can see that the error bar is twice smaller than in the single core program. This reflects the fact that there are four times more samples.