====================================
Perl
====================================

What is Perl?
++++++++++++++++++++++++++

Perl is a powerful and easy-to-use scripting language, that is, it supports "scripts", which are programs written for automating the execution of multiple tasks (e.g. running numerical convergence tests) that could alternatively be executed one-by-one by a human operator. This is opposed to programming languages, such as Fortran, which are used for computation, not for processing text or interacting with the shell. Over recent years, Perl has evolved to a general-purpose programming language used for a wide range of tasks such as web development, network programming, GUI development, and more. 

This section is intended to give a quick overview of Perl, necessary and relevant to the focus of this course. Much more information can be found on the internet; see e.g. `perldoc`__ and `perltoc`__ and `tutorialspoint`__. 

Before we start using Perl, make sure you have Perl. In a shell terminal type:

.. code-block:: none

		$ which perl
		/usr/bin/perl
		$ perl --version

		This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
		
Good! I have Perl 5 in the path `/usr/bin/` which is already in my search path (`PATH`__).
		

Perl scripts
++++++++++++++++++++++++

A perl script is a text file with the extension ".pl". Create a text file, named ``hello.pl``, with the content:

.. code-block:: none

  #!/usr/bin/perl

  print "Hello World!\n";

The first line (starting with a shebang character ``#!``) enforces using Perl 5 in `/usr/bin/` (you may need to modify depending on the location of Perl in your computer). Alternatively, if ``Perl`` is in your search path (which is supposed to be), you could also write 

.. code-block:: none

  #!perl

  print "Hello World!\n";


To "run the script" you would simply go to a shell and type ``perl hello.pl``.



Scalar data
+++++++++++++++++++++++

A scalar is a single unit of data. It is either a `number` or a `string`:

* **Number literals** can be integers (e.g. ``6`` or ``123``) or floating point numbers (e.g. ``-1.23e-4`` or ``1.2``). 

* **String literals** are sequences of characters (e.g. ``'a1'`` or ``"Hellow"``). They are usually alphanumeric values delimited by either single (') or double (") quotes.

  * `Single quotes`: A single quote string literal is just a collection of characters, e.g. ``'a1'`` or ``'Hellow'``. They also support two special characters ``\'`` and ``\\``. 

    
  * `Double quotes`: A double quote string literal allows variables interpolation and supports ``\`` (backslash) escape characters, such as ``\n`` (newline), ``\t`` (tab), ``\u`` (forces next character to uppercase), ``\l`` (forces next character to lowercase), ``\U`` (forces all following characters to uppercase), ``\L`` (forces all following characters to lowercase), ``\E`` (ends ``\U`` and ``\L``).


  For example, generate a file, named ``test1.pl``:

  .. code-block:: none

		    #!perl
		    
		    print 'Abc' . "\n" ;
		    print 'Abc\'s' . "\n" ;
		    print 'Abc\\' . "\n" ;

		    print "\n";

		    print "A\tB\n\LCDEF\E\n123\n";

  And then in the shel:

  .. code-block:: none
		     
		    $ perl test1.pl
		    Abc
		    Abc's
		    Abc\
		    
		    A       B
		    cdef
		    123

  Note that ``.`` concatenates two strings; see "miscellaneous operators" below.
    
  
		    
 


List data
+++++++++++++++++++++++

A list is an ordered set of scalars. For example ``(1,2,3)`` is a list of three numbers, ``('a','b','c')`` is list of three strings, ``('Hello')`` is a list of one string, and ``( )`` is an empty list. 


Scalar variables
+++++++++++++++++++++++

A scalar variable stores a single scalar data (a number or a string) and hence reserves some space in memory. A scalar variable starts with a ``$`` sign. For example consider ``$a = 1;`` written in a Perl script. Here, ``1`` is a scalar number and ``$a`` is a scalar variable that holds value 1 (or stores number 1). As another example, ``$name='Mohammad';`` stores the string ``'Mohammad'`` in the scalar variable ``$name``. Note that variable names are case sensitive. This means that it is OK if, for instance, we write ``$ab=1`` and ``$AB=2`` in the same script.

 

Array variables
+++++++++++++++++++++++

An array is a variable that stores an ordered list of scalar values. An array variable starts with a ``@`` sign. To refer to a single element of an array variable, which is a scalar variable, we use the ``$`` sign with the variable name and followed by the index of the element in square brackets ``[]``. Indexation of elements starts with 0. For example, generate a file, named ``test2.pl``:

.. code-block:: none

		#!perl

		@ages = (20, 22, 25);             
		@names = ("Dan", "Maria", "Sanju");

		print "\$ages[0] = $ages[0]\n";       #or equivalently "\$ages[0] = " . $ages[0] . "\n";
		print "\$ages[1] = $ages[1]\n";
		print "\$ages[2] = $ages[2]\n";
		print "\$names[0] = $names[0]\n";
		print "\$names[1] = $names[1]\n";
		print "\$names[2] = $names[2]\n";

		print "\n";
		
		print "$names[0] is $ages[0] years old\n";
		print "$names[1] is $ages[1] years old\n";
		print "$names[2] is $ages[2] years old\n";
		

We use ``\`` before ``$`` just to print its name, not its value. When executed (type ``perl test2.pl`` in a shell), this will produce the following result:

.. code-block:: none

		$ages[0] = 20
		$ages[1] = 22
		$ages[2] = 25
		$names[0] = Dan
		$names[1] = Maria
		$names[2] = Sanju

		Dan is 20 years old
		Maria is 22 years old
		Sanju is 25 years old


**Remark:** (lists vs. arrays) One of the most common sources of confusion is the difference between `lists` and `arrays`. Consider ``@vec = (1,2,3)``. In this example, the thing on the right-hand side of ``=`` is a list. We assign that list to the the variable ``@vec``. That variable, which begins with the ``@`` sign,  is an array. Therefore, a list can be assigned to an array. Moreover, arrays can have names (starting with ``@``), but lists cannot. 


Perl operators
++++++++++++++++++++++++

Perl language supports many operator types. We will review four most frequently used operators.


**Arithmetic operators** include addition (``+``), subtraction (``-``), multiplication (``*``), division (``/``), and exponentiation (``**``). In Perl all operations with numbers are performed using double precision.



For example let ``$a = 10`` and ``$b = 2``. Then ``$a + $b`` will give 12 and ``$a ** $b`` will give 100.  
		
		
**Assignment operators:**

* ``=`` assigns values from right side operand to left side operand
* ``+=`` e.g. ``$b += $a`` is equivalent to ``$b = $b + $a``
* ``-=`` e.g. ``$b-= $a`` is equivalent to ``$b = $b - $a``
* ``*=`` e.g. ``$b *= $a`` is equivalent to ``$b = $b * $a``
* ``/=`` e.g. ``$b/= $a`` is equivalent to ``$b = $b / $a``
* ``**=`` e.g. ``$b **= $a`` is equivalent to ``$b = $b ** $a``


**Relational operators** are divided into two categories:

* Numeric relational operators (``==``, ``!=``, ``<``, ``>``, ``<=``, ``>=``)
* String relational operators (``eq``, ``ne``, ``lt``, ``gt``, ``le``, ``ge``)  

Example: suppose ``$a=10``, ``$b=20``, ``$c="xyz"``, ``$d="XYZ"``. Then ``($a == $b)`` is not true, and ``($c ne $d)`` is true.


**Miscellaneous Operators:**

* ``.`` (concatenation) concatenates two strings; see the examples above.
* ``x`` (repetition) returns a string consisting of the left operand repeated the number of times specified by the right operand. For example ``('+' x 5)`` will give ``+++++``.
* ``..`` (range) returns a list of values counting (up by ones) from the left value to the right value. For example, ``(4..9)`` will give ``(4, 5, 6, 7, 8, 9)``. 
* ``++`` (increment) increases integer value by one. For example, if ``$a=7``, then ``$a++`` is 8.
* ``--`` (decrement) decreases integer value by one. For example, if ``$a=7``, then ``$a--`` is 6.



Loops
++++++++++++++++++++++++++++++

The most useful loops in Perl are ``while``, ``for``, and ``foreach`` loops.

The ``while`` and ``for`` loops in Perl behave pretty much like most other languages.

The syntax of a ``while`` loop is ``while(condition) {statements;}``.

The syntax of a ``for`` loop is ``for ( init; condition; increment ){statements;}``. 


.. code-block:: none

		$n = 5; $fact = 1; $i = 1;
		
		while ($i <= $n ) {
		         $fact *= $i;
		         $i += 1;
		}
   
		print "$n! = $fact \n";

This will display ``5! = 120`` in terminal window.

.. code-block:: none
		
		for ($i = 1; $i <= 10; $i += 1) {
		      print "$i ";
		}
		
		print "\n";

This will display ``1 2 3 4 5 6 7 8 9 10`` in terminal window. 






The ``foreach`` loop iterates over a normal list value (assigned to an array variable) by setting the iteration variable to be each element of the list in turn.

The syntax for a ``foreach`` loop is ``foreach $i (list) {statements;}``.

.. code-block:: none

		@food = qw/ pancake  taco soup/ ;          # qw is the quote word operator
		@meal = ('breakfast', 'lunch', 'dinner');
		$i=0;
		
		foreach $a (@food) {
		       print "We have $a for $meal[$i] \n";
	               $i+=1;
		}

This will display

.. code-block:: none
		
		We have pancake for breakfast 
		We have taco for lunch 
		We have soup for dinner 


**Nested loops:** A loop can be nested inside another loop. For example the syntax for a "nested for loop" is

.. code-block:: none

		for ( init; condition; increment ){
		      for ( init; condition; increment ){statements;}
                      statements; 
		}



Conditionals
++++++++++++++++++++++++++++++

The basic structure of the ``if-elsif-else`` statement is shown in the
following simple example.

.. code-block:: none

		if (1==2) {print "1=2\n";}
		elsif (1==3) {print "1=3\n";}
		else {print "I found out that 1 is not equal to 2 or 3! \n";}

		
The special variable $_
+++++++++++++++++++++++++++++

There are some variables which have special meanings in Perl. The most commonly used special variable is ``$_``. It contains the "default iterator variable" in a ``foreach`` loop if no other variable is supplied. In this case you can either type ``$_`` or leave it out. For example, in the example above, you may leave out the iteration variable "$a". Perl will then use ``$_``, which is "$a" by default:


.. code-block:: none

		@food = qw/ pancake  taco soup/ ;          
		@meal = ('breakfast', 'lunch', 'dinner');
		$i=0;
		
		foreach (@food) {
		       print "We have ";
		       print ;
		       print " for $meal[$i] \n";
	               $i+=1;
		}

Here both ``foreach`` and the second ``print`` use ``$_``, which is "$a" by default. The output will be the same as above.


File Input-Output
+++++++++++++++++++++++++++


Perl makes file input and output extremely easy. We use the ``open`` command to open a filestream and then "read" from and "write" to it. Then once we are done, we use the ``close`` command to close the file.

The syntax for opening a file is

* In read-only mode: ``open(FILEHANDLE,"<filename");`` or ``open(FILEHANDLE,"filename");``

* In writing mode: ``open(FILEHANDLE,">filename");``

* To append to a file: ``open(FILEHANDLE,">>filename");`` 

All these commands open the file `filename`, which is located on your disk, and associate a filehandle `FILEHANDLE` with the file. A filehandle, usually all caps, is a structure that associates a file with a name. 


As an example, consider the following code:

.. code-block:: none

		#!perl
		
		#Part 1
		$myFile="./data1.txt";
		$outFile="./data2.txt";
		open(FILE,"<$myFile") || die "cannot open file $myFile!";
		open(OUTFILE,">$outFile") || die "cannot open file!";

		#Part 2
		while( $line = <FILE> )  # read one line at a time until the end of file
		{
		print OUTFILE $line;
		print $line;
		}

		#Part 3
		close(OUTFILE);
		close(FILE);

This program will first open a file, named "data1.txt", to read and a file, named "data2.txt", to write to. The ``die`` command (followed by a message) will halt the program if it fails to open the file, for example, if the file "data1.txt" does not exist in the current working directory. It then copies the file $myFile to $outFile. Finally, it closes both files.

Another example:

.. code-block:: none

		#!perl

		open FILE, ">data3.txt";    #opens a file to be written to
		while(<>){                         #while we are getting input from the keyboard
		print FILE $_;                     #write it to the file
		}
		close FILE;                         #closes the file.

You can end the input from keyboard by ``Ctrl+D``.

Note that ``>`` will create a new file, named "data3.txt": it will open a new file and write data into it. If the file had already existed it would have removed the whole existing data and just put in data you just wrote. To prevent this, you would need to open file in ``>>`` mode.


Regular Expressions
++++++++++++++++++++++++++++++++++++++++

A regular expression (regex) is a pattern that can be used to match a string against and possibly substitute it by another pattern. For example, we may need to search a file for some pattern (e.g. a particular word) and then replace it with something else (e.g. another word).

Two main regex operators within Perl are: **match** (``//``) and **substitute** (``s///``).

**The Match Operator** is used to match a string or statement to a regex. For example, to match the regex "green" against the default ``$_ = "The tree is green"``, we write the following code:

.. code-block:: none

		#!/usr/bin/perl
		
		$_ = "The tree is green";
		
		if(/green/){
		   print "Found green!\n";
		}


The above code checks if "green" appears in the default string ``$_``. If it appears, then, the expression in the if-statement returns true, otherwise it returns false. Hence the above code will print ``"Found green!"``, because there is a "green" in the string ``$_``. Note that the two forward slashes are the delimiters of the regex (just as single-quotes or double-quotes are delimiters of regular strings). 

Matching against the default variable ``$_`` is not the only way to use regex in Perl. We can also use the binding operator ``=~`` to match against the string on the left.

.. code-block:: none

		$str = 'The tree is green';
		if($str =~ /green/){
		   print "Found green!\n";
		}

On the left-hand side of the ``=~`` operator there is a string. On the right-hand side there is a regex (which is "green"). This code would also print ``"Found green!"``.
		
**Some useful characters:**

* ``.`` matches any single character except newline. For example, the regex ``/c.t/`` will match any string with 'c' followed by any character, followed by 't'. It will hence match e.g. "cat", "cut", "c t", and "c.t".

* ``*`` matches zero or more occurrences of preceding expression. For
  example, in the pattern ``/xy*z/`` the ``x`` and the ``z`` are
  required, but the ``y`` can appear any number of times including not
  at all. This pattern would match e.g. ``xz``, ``xyz``, ``xyyz``, ``xyyyyyyyyyyyyyyyyz``, etc.
  
* ``+`` matches one or more occurrence of preceding expression. For example ``/A+/`` matches ``A``, ``AA``, etc.

* ``{n}`` matches exactly n number of occurrences of preceding expression. 

* Parenthesis ``()`` is used to search for an item longer than one character. For example, ``/(OMG)+/`` would match ``OMGOMGOMG`` while ``/OMG+/`` would match ``OMGGGGGGGGG``.

* ``i``  make the match case-insensitive. For example, ``/(OMG)+/i`` would also match ``oMgomGomg``.

* ``g`` stands for "global" and tells Perl to replace all matches, and
  not just the first one.


* ``\b`` ensures that you match only the whole word. For example, ``/\bOMG\b/`` would match only ``OMG`` and not ``TOMG``.
  
There are more of these that you can find online.

**The Substitution Operator** allows you to replace the text matched with some new text. You can do this for the default or by using binding:

.. code-block:: none

		$_ = "I have a cat on the mat.\n";
		s/cat/CAT/;
		print ;

This will print ``I have a CAT on the mat.``.

.. code-block:: none

		$str = "Sja sjosjuka sjoman skottes av sju skona
		sjukskoterskor pa det sjankande skeppet Shanghai.\n";
		$str =~ s/sja/sju/ig;
		print "$str\n";


This will print ``sju sjosjuka sjoman skottes av sju skona sjukskoterskor pa det sjunkande skeppet Shanghai.``.

**A fun break**: The sentence above is a Swedish tongue-twister:

.. code-block:: none

		Sju sjösjuka sjömän sköttes av sju sköna sjuksköterskor på det sjunkande skeppet Shanghai.

		Seven seasick sailors were nursed by seven beautiful nurses on the sinking ship of Shanghai.

		

It is used by Swedes to make someone who is learning Swedish as a second
language feel miserable and give up pronouncing some of difficult
Swedish words. See the following vido. It is perhaps more fun than Perl.


.. raw:: html
	
        <object width="400" height="300"><param name="movie"
        value="http://www.youtube.com/v/7Hr9N6UQGaQ&hl=en_US&fs=1&rel=0"></param><param
        name="allowFullScreen" value="true"></param><param
        name="allowscriptaccess" value="always"></param><embed
        src="http://www.youtube.com/v/7Hr9N6UQGaQ&hl=en_US&fs=1&rel=0"
        type="application/x-shockwave-flash" allowscriptaccess="always"
        allowfullscreen="true" width="400"
        height="300"></embed></object>


 
__ http://perldoc.perl.org/index.html
__ https://perldoc.pl/5.005/perltoc
__ http://www.tutorialspoint.com/perl
__ https://math.unm.edu/~motamed/Teaching/Fall20/HPSC/unix.html#path-and-the-search-path