=================================
Homework 6
=================================


Due 23.59, Nov/22/2020 
++++++++++++++++++++++++++++++++++++++++++++++++++

Parallel computing I: differentiation and integration with OpenMP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


**Note 1.** Before attempting this homework, make sure you have studied the sections `Parallel computing`__, `OpenMP`__, and
`Stampede2`__.

**Note 2.** It is particularly important that you follow the rules
of thumb in `Good citizenship`__.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In this homework you will use your code from `homework 4`__ and build a program that uses shared memory to differentiate and integrate. Most likely your computer has OpenMP so you should be able to do most of the development locally and just run the timing tests on Stampede2.


* First, Consider the function :math:`u(x,y) = sin(x) \, cos(y)` and the
  mapping :math:`x(r,s) = r + 0.1 \, s` and :math:`y(r,s) = s`. As in
  homework 4, :math:`(x,y) \in \Omega` and  :math:`(r,s) \in
  \Omega_{\text{R}} = [-1,1]^2`. The goal is to approximate :math:`u_{x} +
  u_{y}` using a second-order finite difference method. For
  simplicity, set the number of gridpoints the same in
  both directions in :math:`\Omega_{\text{R}}`, that is :math:`n_r =
  n_s`. Let :math:`n = n_r = n_s`. This will give you the same grid
  size in both directions (:math:`h = h_r = h_s = 2/n`). You should be
  able to use your code from homework 4, where you compute the first derivatives and measure the error:

  .. math::
       :nowrap:
  
       \begin{equation*}
       e_2(h) = \left(\int_{\Omega} \left( u_x (x,y) + u_y (x,y)  -
       u_{x,\rm exact} (x,y) - u_{y,\rm exact} (x,y)   \right)^2  dxdy \right)^{1/2},
       \end{equation*}

  
  

  Check that your serial code converges as expected. Note
  that :math:`u_{x,\rm exact} = cos(x) \, cos(y)` and :math:`u_{y,\rm
  exact} = - sin(x) \, sin(y)`.
       
  

* Next, use OpenMP constructs to parallelize your serial code. Make
  sure you do this in a non-intrusive way so that the code will still
  be possible to compile in serial mode. Try to make your program as parallel as
  possible. I will take this into account when grading.
  
* Demonstrate that your parallel code gives the same result as the
  serial code independent of the number of threads used.
  
* Use the ``omp_get_wtime()`` function to time the computational part
  of your code (try to exclude ``allocate`` statements but include
  assignments, where you have the workshare constructs).

  
* Strong scaling: Compute and display the speedup and efficiency (with 1-16 cores) for
  fixed problem size. Try a small grid (20 by 20) and a larger grid
  (800 by 800). Note that speedup is the ratio of the serial runtime to the time taken by the
  parallel algorithm to solve the same problem on :math:`l` cores. The
  efficiency is defined as the ratio of speedup to the number of
  cores.
  
  
  
* Weak scaling: Compute the speedup and efficiency (with 1-16 cores) for
  grids with a fixed number of gridpoints per core. This means if you
  use 1 core for an :math:`n \times n` grid, you will need to have a
  :math:`\lfloor \sqrt{2} n \rceil \times \lfloor \sqrt{2} n \rceil` grid when using 2 cores, a
  :math:`\lfloor \sqrt{3} n \rceil \times \lfloor \sqrt{3} n \rceil` grid when using 3 cores, etc. The notation :math:`\lfloor m \rceil` means the nearest integer to :math:`m`. You can take :math:`n=200`.
	
  
* As usual, arrange your results neatly in your report and comment on
  them. This time, also discuss the ways you made your code parallel
  and what, if anything, you could improve.

__ http://math.unm.edu/~motamed/Teaching/Fall20/HPSC/parallel.html
__ http://math.unm.edu/~motamed/Teaching/Fall20/HPSC/openmp.html
__ http://math.unm.edu/~motamed/Teaching/Fall20/HPSC/stampede2.html
__ http://math.unm.edu/~motamed/Teaching/Fall20/HPSC/stampede2.html#good-citizenship
__ http://math.unm.edu/~motamed/Teaching/Fall20/HPSC/hw4.html