# Homework 6¶

## Due 23.59, Nov/29/2017¶

### Parallel computing I: differentiation and integration with OpenMP¶

Note 1. Before attempting this homework, make sure you have studied the sections Parallel computing, OpenMP, and Stampede2.

Note 2. It is particularly important that you follow the rules of thumb in Good citizenship.

In this homework you will use your code from homework 4 and build a program that uses shared memory to differentiate and integrate. Most likely your computer has OpenMP so you should be able to do most of the development locally and just run the timing tests on Stampede2.

• First, Consider the function $$u(x,y) = sin(x) \, cos(y)$$ and the mapping $$x(r,s) = r + 0.1 \, s$$ and $$y(r,s) = s$$. As in homework 4, $$(x,y) \in \Omega$$ and $$(r,s) \in \Omega_{\text{R}} = [-1,1]^2$$. The goal is to approximate $$u_{x} + u_{y}$$ using a second-order finite difference method. For simplicity, set the number of gridpoints the same in both directions in $$\Omega_{\text{R}}$$, that is $$n_r = n_s$$. Let $$n = n_r = n_s$$. This will give you the same grid size in both directions ($$h = h_r = h_s = 2/n$$). You should be able to use your code from homework 4, where you compute the Laplacian and measure the error:

\begin{equation*} e_2(h) = \left(\int_{\Omega} \left( u_x (x,y) + u_y (x,y) - u_{x,\rm exact} (x,y) - u_{y,\rm exact} (x,y) \right)^2 dxdy \right)^{1/2}, \end{equation*}

Check that your serial code converges as expected. Note that $$u_{x,\rm exact} = cos(x) \, cos(y)$$ and $$u_{y,\rm exact} = - sin(x) \, sin(y)$$.

• Next, use OpenMP constructs to parallelize your serial code. Make sure you do this in a non-intrusive way so that the code will still be possible to compile in serial mode. Try to make your program as parallel as possible. I will take this into account when grading.

• Demonstrate that your parallel code gives the same result as the serial code independent of the number of threads used.

• Use the omp_get_wtime() function to time the computational part of your code (try to exclude allocate statements but include assignments, where you have the workshare constructs).

• Strong scaling: Compute and display the speedup and efficiency (with 1-16 cores) for fixed problem size. Try a small grid (20 by 20) and a larger grid (800 by 800). Note that speedup is the ratio of the serial runtime to the time taken by the parallel algorithm to solve the same problem on $$l$$ cores. The efficiency is defined as the ratio of speedup to the number of cores.

• Weak scaling: Compute the speedup and efficiency (with 1-16 cores) for grids with a fixed number of gridpoints per core. This means if you use 1 core for an $$n \times n$$ grid, you will need to have a $$\lfloor \sqrt{2} n \rceil \times \lfloor \sqrt{2} n \rceil$$ grid when using 2 cores, a $$\lfloor \sqrt{3} n \rceil \times \lfloor \sqrt{3} n \rceil$$ grid when using 3 cores, etc. The notation $$\lfloor m \rceil$$ means the nearest integer to $$m$$. You can take $$n=200$$.

• As usual, arrange your results neatly in your report and comment on them. This time, also discuss the ways you made your code parallel and what, if anything, you could improve.