# Unix intro

22.1.1 : Looking at files
22.1.1.1 : ls
22.1.1.2 : cat
22.1.1.3 : man
22.1.1.4 : touch
22.1.1.5 : \texttt{cp, mv, rm}
22.1.2 : Directories
22.1.3 : Permissions
22.1.4 : Wildcards
22.2 : Text searching and regular expressions
22.2.1 : Cutting up lines with cut
22.3 : Other useful commands: tar
22.4 : Command execution
22.4.1 : Search paths
22.4.2 : Command sequencing
22.4.2.1 : Simple sequencing
22.4.2.2 : Pipelining
22.4.2.3 : Backquoting
22.4.2.4 : Grouping in a subshell
22.4.3 : Exit status
22.4.4 : Processes and jobs
22.4.5 : Shell customization
22.5 : Input/output Redirection
22.5.1 : Input redirection
22.5.2 : Standard files
22.5.3 : Output redirection
22.6 : Shell environment variables
22.6.1 : Use of shell variables
22.6.2 : Exporting variables
22.7 : Control structures
22.7.1 : Conditionals
22.7.2 : Looping
22.8 : Scripting
22.8.1 : How to execute scripts
22.8.2 : Script arguments
22.9 : Expansion
22.9.1 : Arithmetic expansion
22.10 : Startup files
22.11 : Shell interaction
22.12 : The system and other users
22.12.1 : Groups
22.12.2 : The super user
22.13 : Other systems: ssh and scp
22.14 : The sed and awk tools
22.14.1 : Stream editing with sed
22.14.2 : \tt awk
22.15 : Review questions

# 22 Unix intro

Unix is an \indexacf{OS}, that is, a layer of software between the user or a user program and the hardware. It takes care of files and screen output, and it makes sure that many processes can exist side by side on one system. However, it is not immediately visible to the user. Most of the time that you use Unix, you are typing commands which are executed by an interpreter called the \indexterm{shell}. The shell makes the actual \ac{OS} calls. There are a few possible Unix shells available, but in this tutorial we will assume that you are using the \indexterm{sh} or bash shell, although many commands are common to the various shells in existence.

Most of this tutorial will work on any Unix-like platform, however, there is not just one Unix:

• Apple has Darwin which is close to BSD; IBM and HP have their own versions of Unix, and Linux is yet another variant. The differences between these are deep down and if you are taking this tutorial you probably won't see them for quite a while.
• such as \indexterm{Red Hat} or \indexterm{Ubuntu}. These mainly differ in the organization of system files and again you probably need not worry about them.
• differ considerably. Here you will learn the \indextermdef{bash} shell, which is an improved version of the old \indexterm{sh} shell. For a variety of reasons, bash is to be preferred over the \indexterm{csh} or \indexterm{tcsh} shell. Other shells are the \indexterm{ksh} and \indexterm{zsh}, which is itself an improvement over the bash shell.

## 22.1 Files and such

crumb trail: > unix > Files and such

Purpose In this section you will learn about the Unix file system, which consists of \indexterm{directories} that store \indexterm{files}. You will learn about \indexterm{executable} files and commands for displaying data files.

### 22.1.1 Looking at files

crumb trail: > unix > Files and such > Looking at files

Purpose In this section you will learn commands for displaying file contents.

 $\begin{array}{cccccc} \midrule command function \midrule ls list files or directories touch create new/empty file or update existing file cat gt; filename enter text into file cp copy files mv rename files rm remove files file report the type of file cat filename display file head,tail display part of a file less,more incrementally display a file \midrule \end{array}$

#### 22.1.1.1ls

crumb trail: > unix > Files and such > Looking at files > ls

Without any argument, the \indextermunix{ls} command gives you a listing of files that are in your present location.

Exercise Type \indextermunix{ls}. Does anything show up?

Outcome If there are files in your directory, they will be l i s ted; if there are none, no output will be given. This is standard Unix behavior: no output does not mean that something went wrong, it only means that there is nothing to report.

Exercise If the \indextermunix{ls} command shows that there are files, do

ls name on one of those. By using an option, for instance ls -s name

Caution If you mistype a name, or specify a name of a non-existing file, you'll get an error message.

The \indextermunix{ls} command can give you all sorts of information. In addition to the above ls -s for the size, there is

ls -l for the long' listing. It shows (things we will get to later such as) ownership and permissions, as well as the size and creation date.

Remark There are several dates associated with a file, corresponding to changes in content, changes in permissions, and access of any sort. The \indextermunix{stat} command gives all of them.

#### 22.1.1.2cat

crumb trail: > unix > Files and such > Looking at files > cat

The \indextermunix{cat} command (short for concatenate') is often used to display files, but it can also be used to create some simple content.

Exercise Type cat > newfilename (where you can pick any filename) and type some text. Conclude with Control-d on a line by itself: press the Control key and hold it while you press the d key. Now use \indextermunix{cat} to view the contents of that file: cat newfilename .

Outcome In the first use of \indextermunix{cat}, text was appended from the terminal to a file; in the second the file was cat'ed to the terminal output. You should see on your screen precisely what you typed into the file.

Caution Be sure to type Control-d as the first thing on the last line of input. If you really get stuck, Control-c will usually get you out. Try this: start creating a file with cat > filename and hit Control-c in the middle of a line. What are the contents of your file?

Remark Instead of Control-d you will often see the notation  ^D . The capital letter is for historic reasons: you use the control key and the lowercase letter.

#### 22.1.1.3man

crumb trail: > unix > Files and such > Looking at files > man

The primary (though not always the most easily understood) source for unix commands is the \indextermunixdef{man} command, for manual'. The descriptions available this way are referred to as the \indexterm{manual page}s.

Exercise Read the man page of the ls command:

man ls . Find out the size and the time / date of the last change to some files, for instance the file you just created.

Outcome Did you find the ls -s and ls -l options? The first one lists the size of each file, usually in kilobytes, the other gives all sorts of information about a file, including things you will learn about later.

The \indextermunix{man} command puts you in a mode where you can view long text documents. This viewer is common on Unix systems (it is available as the \indextermunix{more} or \indextermunix{less} system command), so memorize the following ways of navigating: Use the space bar to go forward and the u key to go back up. Use

g to go to the beginning fo the text, and G for the end. Use

q to exit the viewer. If you really get stuck, Control-c will get you out.

Remark If you already know what command you're looking for, you can use man to get online information about it. If you forget the name of a command, \indextermunix{man}  -k keyword can help you find it.

#### 22.1.1.4touch

crumb trail: > unix > Files and such > Looking at files > touch

The \indextermunix{touch} command creates an empty file, or updates the timestamp of a file if it already exists. Use ls -l to confirm this behavior.

#### 22.1.1.5 \texttt{cp, mv, rm}

crumb trail: > unix > Files and such > Looking at files > \texttt{cp, mv, rm}

The \indextermunix{cp} can be used for copying a file (or directories, see below): cp file1 file2 makes a copy of file1 and names it

file2 .

Exercise Use cp file1 file2 to copy a file. Confirm that the two files have the same contents. If you change the original, does anything happen to the copy?

Outcome You should see that the copy does not change if the original changes or is deleted.

Caution If file2 already exists, you will get an error message.

A file can be renamed with \indextermunix{mv}, for move'.

Exercise Rename a file. What happens if the target name already exists?

Files are deleted with rm . This command is dangerous: there is no undo.

crumb trail: > unix > Files and such > Looking at files > \texttt{head, tail}

There are more commands for displaying a file, parts of a file, or information about a file.

Exercise Do ls /usr/share/words or ls /usr/share/dict/words to confirm that a file with words exists on your system. Now experiment with the commands head , tail , more , and wc using that file.

Outcome \indextermunix{head} displays the first couple of lines of a file, \indextermunix{tail} the last, and \indextermunix{more} uses the same viewer that is used for man pages. Read the man pages for these commands and experiment with increasing and decreasing the amount of output. The \indextermunix{wc} (word count') command reports the number of words, characters, and lines in a file.

Another useful command is \indextermunix{file}: it tells you what type of file you are dealing with.

Exercise Do file foo for various foo': a text file, a directory, or the

/bin/ls command.

Outcome Some of the information may not be intelligible to you, but the words to look out for are text', directory', or executable'.

At this point it is advisable to learn to use a text \indexterm{editor}, such as \indexterm{emacs} or vi.

### 22.1.2 Directories

crumb trail: > unix > Files and such > Directories

Purpose Here you will learn about the Unix directory tree, how to manipulate it and how to move around in it.

 $\begin{array}{cccccc} \midrule command function \midrule ls list the contents of directories mkdir make new directory cd change directory pwd display present working directory \midrule \end{array}$

A unix file system is a tree of directories, where a directory is a container for files or more directories. We will display directories as follows:

\dirdisplay{.1 /\DTcomment{The root of the directory tree}. .2 bin\DTcomment{Binary programs}. .2 home\DTcomment{Location of users directories}. }

The root of the Unix directory tree is indicated with a slash. Do

ls / to see what the files and directories there are in the root. Note that the root is not the location where you start when you reboot your personal machine, or when you log in to a server.

Outcome You will typically see something like /home/yourname or

/Users/yourname . This is system dependent.

Do ls to see the contents of the working directory. In the displays in this section, directory names will be followed by a slash:  dir/ but this character is not part of their name. You can get this output by using ls -F , and you can tell your shell to use this output consistently by stating alias ls=ls -F at the start of your session. Example:

\dirdisplay{.1 /home/you/. .2 adirectory/. .2 afile. }

The command for making a new directory is \indextermunix{mkdir}.

Exercise Make a new directory with \indextermunix{mkdir}  newdir and view the current directory with ls .

Outcome You should see this structure: \dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{the new directory}. }

The command for going into another directory, that is, making it your working directory, is \indextermunix{cd} (change directory'). It can be used in the following ways:

• the directory tree, that is, starts with  / . The cd command takes you to that location.
• start at the root. This form of the cd command takes you to

<yourcurrentdir>/<relative path> .

Exercise Do cd newdir and find out where you are in the directory tree with pwd . Confirm with ls that the directory is empty. How would you get to this location using an absolute path?

Outcome

pwd should tell you /home/you/newdir , and ls then has no output, meaning there is nothing to list. The absolute path is

/home/you/newdir .

Exercise Let's quickly create a file in this directory: \indextermunix{touch}

onefile , and another directory: mkdir otherdir . Do ls

and confirm that there are a new file and directory.

Outcome You should now have: \dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{you are here}. .3 onefile. .3 otherdir/. }

The ls command has a very useful option: with ls -a you see your regular files and hidden files, which have a name that starts with a dot. Doing ls -a in your new directory should tell you that there are the following files:

\dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{you are here}. .3 .. .3 ... .3 onefile. .3 otherdir/. }

The single dot is the current directory, and the double dot is the directory one level back.

Exercise Predict where you will be after cd ./otherdir/.. and check to see if you were right.

Outcome The single dot sends you to the current directory, so that does not change anything. The otherdir part makes that subdirectory your current working directory. Finally, .. goes one level back. In other words, this command puts your right back where you started.

Since your home directory is a special place, there are shortcuts for

cd 'ing to it: cd without arguments, cd   , and cd \$HOME all get you back to your home. Go to your home directory, and from there do ls newdir to check the contents of the first directory you created, without having to go there. Exercise What does ls .. do? Outcome Recall that .. denotes the directory one level up in the tree: you should see your own home directory, plus the directories of any other users. Exercise Can you use ls to see the contents of someone else's home directory? In the previous exercise you saw whether other users exist on your system. If so, do ls ../thatotheruser . Outcome If this is your private computer, you can probably view the contents of the other user's directory. If this is a university computer or so, the other directory may very well be protected -- permissions are discussed in the next section -- and you get ls: ../otheruser: Permission denied . Make an attempt to move into someone else's home directory with cd . Does it work? You can make copies of a directory with cp , but you need to add a flag to indicate that you recursively copy the contents: \n{cp -r}. Make another directory somedir in your home so that you have \dirdisplay{.1 /home/you/. .2 newdir/\DTcomment{you have been working in this one}. .2 somedir/\DTcomment{you just created this one}. } What is the difference between cp -r newdir somedir and cp -r newdir thirddir where thirddir is not an existing directory name? ### 22.1.3 Permissions crumb trail: > unix > Files and such > Permissions (label: sec:unix-permissions) Purpose In this section you will learn about how to give various users on your system permission to do (or not to do) various things with your files. Unix files, including directories, have permissions, indicating who can do what with this file'. Actions that can be performed on a file fall into three categories: • on it) that does not change the file; • metadata such as date modified'; • directory, to enter it. The people who can potentially access a file are divided into three classes too: These nine permissions are rendered in sequence $\begin{array}{ccc} \toprule user&group&other\\ \midrule rwx&rwx&rwx \\ \bottomrule \end{array}$ For instance rw-r--r-- means that the owner can read and write a file, the owner's group and everyone else can only read. Permissions are also rendered numerically in groups of three bits, by letting$\mathtt{r}=4$,$\mathtt{w}=2$,$\mathtt{x}=1$: $\begin{array}{c} \toprule rwx\\ \midrule 421 \\ \bottomrule \end{array}$ Common codes are$7=\mathtt{rwx}$and$6=\mathtt{rw}$. You will find many files that have permissions$755$which stands for an executable that everyone can run, but only the owner can change, or$644$which stands for a data file that everyone can see but again only the owner can alter. You can set permissions by the \indextermunix{chmod} command:  chmod <permissions> file # just one file chmod -R <permissions> directory # directory, recursively  Examples:  chmod 766 file # set to rwxrw-rw- chmod g+w file # give group write permission chmod g=rx file # set group permissions chod o-w file # take away write permission from others chmod o= file # take away all permissions from others. chmod g+r,o-x file # give group read permission # remove other execute permission  The man page gives all options. Exercise Make a file foo and do chmod u-r foo . Can you now inspect its contents? Make the file readable again, this time using a numeric code. Now make the file readable to your classmates. Check by having one of them read the contents. Outcome 1. A file is only accessible by others if the surrounding folder is readable. Can you figure out how to do this? 2. When you've made the file unreadable' by yourself, you can still ls it, but not cat it: that will give a permission denied' message. Make a file com with the following contents: #!/bin/sh echo "Hello world!"  This is a legitimate shell script. What happens when you type ./com ? Can you make the script executable? In the three permission categories it is clear who you' and others' refer to. How about group'? We'll go into that in section \ref{sec:users}. Remark There are more obscure permissions. For instance the \indexterm{setuid} bit declares that the program should run with the permissions of the creator, rather than the user executing it. This is useful for system utilities such passwd or mkdir , which alter the password file and the directory structure, for which \indextermbus{root}{privileges} are needed. Thanks to the setuid bit, a user can run these programs, which are then so designed that a user can only make changes to their own password entry, and their own directories, respectively. The setuid bit is set with \indextermunix{chmod}: chmod 4ugo file . ### 22.1.4 Wildcards crumb trail: > unix > Files and such > Wildcards (label: sec:shell-wildcard) You already saw that ls filename gives you information about that one file, and ls gives you all files in the current directory. To see files with certain conditions on their names, the wildcard mechanism exists. The following wildcards exist:  $\begin{array}{cccccc} \toprule * any number of characters ? any character. \bottomrule \end{array}$ Example: %% ls s sk ski skiing skill %% ls ski* ski skiing skill  The second option lists all files whose name start with ski , followed by any number of other characters'; below you will see that in different contexts ski* means  sk followed by any number of i characters'. Confusing, but that's the way it is. ## 22.2 Text searching and regular expressions crumb trail: > unix > Text searching and regular expressions (label: sec:regexp) Purpose In this section you will learn how to search for text in files. For this section you need at least one file that contains some amount of text. You can for instance get random text from The \indextermunix{grep} command can be used to search for a text expression in a file. Exercise Search for the letter q in your text file with \n{grep q yourfile} and search for it in all files in your directory with grep q * . Try some other searches. Outcome In the first case, you get a listing of all lines that contain a q ; in the second case, grep also reports what file name the match was found in: qfile:this line has q in it . Caution If the string you are looking for does not occur, grep will simply not output anything. Remember that this is standard behavior for Unix commands if there is nothing to report. In addition to searching for literal strings, you can look for more general expressions.  $\begin{array}{cccccc} \midrule ^ the beginning of the line the end of the line . any character * any number of repetitions [xyz] any of the characters \n{xyz} \midrule \end{array}$ This looks like the wildcard mechanism you just saw (section \ref{sec:shell-wildcard}) but it's subtly different. Compare the example above with: %% cat s sk ski skill skiing %% grep "ski*" s sk ski skill skiing  In the second case you search for a string consisting of sk and any number of i characters, including zero of them. Some more examples: you can find \item All lines that contain the letter q' with grep q yourfile ; \item All lines that start with an a' with grep "^a" yourfile (if your search string contains special characters, it is a good idea to use quote marks to enclose it); \item All lines that end with a digit with grep "[0-9]$" yourfile .

Exercise Construct the search strings for finding

Outcome For the first, use the range characters [] , for the second use the period to match any character.

Exercise Add a few lines x = 1 , \n{x {} = 2}, \n{x {} {} = 3} (that is, have different numbers of spaces between x and the equals sign) to your test file, and make grep commands to search for all assignments to  x .

The characters in the table above have special meanings. If you want to search that actual character, you have to \indexterm{escape} it.

Exercise Make a test file that has both abc and a.c in it, on separate lines. Try the commands grep "a.c" file , grep a\\.c file , grep "a\\.c" file.

Outcome You will see that the period needs to be escaped, and the search string needs to be quoted. In the absence of either, you will see that grep also finds the abc string.

### 22.2.1 Cutting up lines with cut

crumb trail: > unix > Text searching and regular expressions > Cutting up lines with cut

Another tool for editing lines is \indextermunix{cut}, which will cut up a line and display certain parts of it. For instance,

cut -c 2-5 myfile



will display the characters in position 2--5 of every line of

myfile . Make a test file and verify this example.

Maybe more useful, you can give cut a delimiter character and have it split a line on occurrences of that delimiter. For instance, your system will mostly likely have a file /etc/passwd that contains user information\footnote{This is traditionally the case; on Mac OS information about users is kept elsewhere and this file only contains system services.}, with every line consisting of fields separated by colons. For instance:

daemon:*:1:1:System Services:/var/root:/usr/bin/false

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false



The seventh and last field is the login shell of the user;

You can display users and their login shells with:

cut -d ":" -f 1,7 /etc/passwd



This tells cut to use the colon as delimiter, and to print fields 1 and 7.

## 22.3 Other useful commands: tar

crumb trail: > unix > Other useful commands: tar

The \indextermunix{tar} command stands for tape archive', that is, it was originally meant to package files on a tape. (The archive' part derives from the \indextermunix{ar} command.) These days, it's used to package files together for distribution on web sites and such: if you want to publish a library of hundreds of files this bundles them into a single file.

The two most common options are for

1. tar fc package.tar directory_with_stuff



pronounced tar file create', and
2. tar fx package.tar

# this creates the directory that was packaged



pronounced tar file extract'.

Text files can often be compressed to a large extent, so adding the z compressiong for \indextermunix{gzip} is a good idea:

tar fcz package.tar.gz directory_with_stuff

tar fx package.tar.gz



Naming the gzipped' file package.tgz

is also common.

## 22.4 Command execution

crumb trail: > unix > Command execution

### 22.4.1 Search paths

crumb trail: > unix > Command execution > Search paths

(label: sec:PATH)

Purpose In this section you will learn how Unix determines what to do when you type a command name.

If you type a command such as ls , the shell does not just rely on a list of commands: it will actually go searching for a program by the name ls . This means that you can have multiple different commands with the same name, and which one gets executed depends on which one is found first.

Exercise What you may think of as Unix commands' are often just executable files in a system directory. Do \indextermunix{which}  ls , and do an ls -l on the result.

Outcome The location of ls is something like /bin/ls . If you

ls that, you will see that it is probably owned by root. Its executable bits are probably set for all users.

The locations where unix searches for commands is the search path, which is stored in the \indexterm{environment variable} (for more details see below) \indextermunixdef{PATH}.

Exercise Do echo \$PATH . Can you find the location of cd ? Are there other commands in the same location? Is the current directory  . ' in the path? If not, do export PATH=".:\$PATH" . Now create an executable file cd in the current director (see above for the basics), and do cd .

Outcome The path will be a list of colon-separated directories,\\ for instance /usr/bin:/usr/local/bin:/usr/X11R6/bin . If the working directory is in the path, it will probably be at the end:

/usr/X11R6/bin:. but most likely it will not be there. If you put  . ' at the start of the path, unix will find the local

cd command before the system one.

Some people consider having the working directory in the path a security risk. If your directory is writable, someone could put a malicious script named cd (or any other system command) in your directory, and you would execute it unwittingly.

It is possible to define your own commands as aliases of existing commands.

Exercise Do alias chdir=cd and convince yourself that now chdir works just like cd . Do alias rm='rm -i' ; look up the meaning of this in the man pages. Some people find this alias a good idea; can you see why?

Outcome The -i interactive' option for rm makes the command ask for confirmation before each delete. Since unix does not have a trashcan that needs to be emptied explicitly (as on Windows or the Mac OS), this can be a good idea.

### 22.4.2 Command sequencing

crumb trail: > unix > Command execution > Command sequencing

(label: tut:unix-bq)

There are various ways of having multiple commands on a single commandline.

#### 22.4.2.1 Simple sequencing

crumb trail: > unix > Command execution > Command sequencing > Simple sequencing

First of all, you can type

command1 ; command2



This is convenient if you repeat the same two commands a number of times: you only need to up-arrow once to repeat them both.

There is a problem: if you type

cc -o myprog myprog.c ; ./myprog



and the compilation fails, the program will still be executed, using an old version of the executable if that exists. This is very confusing.

A better way is:

cc -o myprog myprog.c && ./myprog



which only executes the second command if the first one was successful.

#### 22.4.2.2 Pipelining

crumb trail: > unix > Command execution > Command sequencing > Pipelining

Instead of taking input from a file, or sending output to a file, it is possible to connect two commands together, so that the second takes the output of the first as input. The syntax for this is cmdone | cmdtwo ; this is called a pipeline. For instance, grep a yourfile | grep b finds all lines that contains both an

a and a  b .

Exercise Construct a pipeline that counts how many lines there are in your file that contain the string th . Use the wc command (see above) to do the counting.

#### 22.4.2.3 Backquoting

crumb trail: > unix > Command execution > Command sequencing > Backquoting

(label: tut:unix-backquote)

There are a few more ways to combine commands. Suppose you want to present the result of wc a bit nicely. Type the following command

echo The line count is wc -l foo



where foo is the name of an existing file. The way to get the actual line count echoed is by the \indextermdef{backquote}:

echo The line count is wc -l foo



Anything in between backquotes is executed before the rest of the command line is evaluated.

Exercise The way wc is used here, it prints the file name. Can you find a way to prevent that from happening?

There is another mechanism for out-of-order evaluation:

echo "There are $( cat Makefile | wc -l ) lines"  This mechanism makes it possible to nest commands, but for compatibility and legacy purposes backquotes may still be preferable when nesting is not neeeded. #### 22.4.2.4 Grouping in a subshell crumb trail: > unix > Command execution > Command sequencing > Grouping in a subshell Suppose you want to apply output redirection to a couple of commands in a row:  configure ; make ; make install > installation.log 2>&1  This only catches the last command. You could for instance group the three commands in a subshell and catch the output of that:  ( configure ; make ; make install ) > installation.log 2>&1  ### 22.4.3 Exit status crumb trail: > unix > Command execution > Exit status Commands can fail. If you type a single command on the command line, you see the error, and you act accordingly when you type the next command. When that failing command happens in a script, you have to tell the script how to act accordingly. For this, you use the exit status of the command: this is a value (zero for success, nonzero otherwise) that is stored in an internal variable, and that you can access with$? .

Example. Suppose we have a directory that is not writable

[testing] ls -ld nowrite/

dr-xr-xr-x  2 eijkhout  506  68 May 19 12:32 nowrite//

[testing] cd nowrite/



and write try to create a file there:

[nowrite] cat ../newfile

#!/bin/bash

touch $1 echo "Created file:$1"

[nowrite] newfile myfile

[nowrite] ../newfile myfile

touch: myfile: Permission denied

Created file: myfile

[nowrite] ls

[nowrite]



The script reports that the file was created even though it wasn't.

Improved script:

[nowrite] cat ../betterfile

#!/bin/bash

touch $1 if [$? -eq 0 ] ; then

echo "Created file: $1" else echo "Problem creating file:$1"

fi

[nowrite] ../betterfile myfile

touch: myfile: Permission denied

Problem creating file: myfile



### 22.4.4 Processes and jobs

crumb trail: > unix > Command execution > Processes and jobs

 $\begin{array}{cccccc} \midrule ps list (all) processes kill kill a process CTRL-c kill the foreground job CTRL-z suspect the foreground job jobs give the status of all jobs fg bring the last suspended job to the foreground fg %3 bring a specific job to the foreground bg run the last suspended job in the background \midrule \end{array}$

The Unix operating system can run many programs at the same time, by rotating through the list and giving each only a fraction of a second to run each time. The command \indextermunix{ps} can tell you everything that is currently running.

Exercise Type ps . How many programs are currently running? By default

ps gives you only programs that you explicitly started. Do \n{ps guwax} for a detailed list of everything that is running. How many programs are running? How many belong to the root user, how many to you?

Outcome To count the programs belonging to a user, pipe the ps command through an appropriate grep , which can then be piped to wc .

In this long listing of ps , the second column contains the process numbers Sometimes it is useful to have those: if a program misbehaves you can kill it with

kill 123456



where 12345 is the process number.

The cut command explained above can cut certain position from a line: type ps guwax | cut -c 10-14 .

To get dynamic information about all running processes, use the

top command. Read the man page to find out how to sort the output by CPU usage.

Processes that are started in a shell are known as jobs In addition to the process number, they have a job number. We will now explore manipulating jobs.

When you type a command and hit return, that command becomes, for the duration of its run, the \indexterm{foreground process}. Everything else that is running at the same time is a background process.

Make an executable file hello with the following contents:

#!/bin/sh

while [ 1 ] ; do

sleep 2

date

done



and type ./hello .

Exercise Type Control-z . This suspends the foreground process. It will give you a number like [1] or [2] indicating that it is the first or second program that has been suspended or put in the background. Now type bg to put this process in the background. Confirm that there is no foreground process by hitting return, and doing an ls .

Outcome After you put a process in the background, the terminal is available again to accept foreground commands. If you hit return, you should see the command prompt. However, the background process still keeps generating output.

Exercise Type jobs to see the processes in the current session. If the process you just put in the background was number 1, type fg \%1 . Confirm that it is a foreground process again.

Outcome If a shell is executing a program in the foreground, it will not accept command input, so hitting return should only produce blank lines.

Exercise When you have made the hello script a foreground process again, you can kill it with Control-c . Try this. Start the script up again, this time as ./hello \& which immediately puts it in the background. You should also get output along the lines of [1] 12345 which tells you that it is the first job you put in the background, and that 12345 is its process ID. Kill the script with kill \%1 . Start it up again, and kill it by using the process number.

Outcome The command kill 12345 using the process number is usually enough to kill a running program. Sometimes it is necessary to use

kill -9 12345 .

### 22.4.5 Shell customization

crumb trail: > unix > Command execution > Shell customization

Above it was mentioned that ls -F is an easy way to see which files are regular, executable, or directories; by typing \n{alias ls='ls -F'} the ls command will automatically expanded to \n{ls -F} every time it is invoked. If you would like this behavior in every login session, you can add the alias command to your

.profile file. Other shells than sh / bash have other files for such customizations.

## 22.5 Input/output Redirection

crumb trail: > unix > Input/output Redirection

(label: sec:unixpipe)

Purpose In this section you will learn how to feed one command into another, and how to connect commands to input and output files.

So far, the unix commands you have used have taken their input from your keyboard, or from a file named on the command line; their output went to your screen. There are other possibilities for providing input from a file, or for storing the output in a file.

### 22.5.1 Input redirection

crumb trail: > unix > Input/output Redirection > Input redirection

The grep command had two arguments, the second being a file name. You can also write grep string < yourfile , where the less-than sign means that the input will come from the named file,

yourfile . This is known as input redirection .

### 22.5.2 Standard files

crumb trail: > unix > Input/output Redirection > Standard files

Unix has three standard files that handle input and output:

 $\begin{array}{cccccc} \toprule Standard file Purpose \midrule stdin is the file that provides input for processes. stdout is the file where the output of a process is written. stderr is the file where error output is written. \bottomrule \end{array}$

In an interactive session, all three files are connected to the user terminal. Using input or output redirection then means that the input is taken or the output sent to a different file than the terminal.

### 22.5.3 Output redirection

crumb trail: > unix > Input/output Redirection > Output redirection

Just as with the input, you can redirect the output of your program. In the simplest case,

grep string yourfile > outfile

will take what normally goes to the terminal, and redirect

the output to outfile . The output file is created if it didn't already exist, otherwise it is overwritten. (To append, use grep text yourfile >> outfile .)

Exercise Take one of the grep commands from the previous section, and send its output to a file. Check that the contents of the file are identical to what appeared on your screen before. Search for a string that does not appear in the file and send the output to a file. What does this mean for the output file?

Outcome Searching for a string that does not occur in a file gives no terminal output. If you redirect the output of this grep to a file, it gives a zero size file. Check this with ls and wc .

Sometimes you want to run a program, but ignore the output. For that, you can redirect your output to the system \indextermsub{null}{device}: /dev/null.

yourprogram >/dev/null



Here are some useful idioms:

 $\begin{array}{cccccc} \toprule Idiom Meaning \midrule program 2 gt;/dev/null send only errors to the null device program gt;/dev/null 2 gt; 1 send output to dev-null, and errors to output Note the counterintuitive sequence of specifications! program 2 gt; 1 | less send output and errors to less \bottomrule \end{array}$

## 22.6 Shell environment variables

crumb trail: > unix > Shell environment variables

(label: tut:shellvars)

Above you encountered PATH , which is an example of an shell, or environment, variable. These are variables that are known to the shell and that can be used by all programs run by the shell. While PATH is a built-in variable, you can also define your own variables, and use those in shell scripting.

Shell variables are roughly divided in the following categories:

• see below.

You can see the full list of all variables known to the shell by typing \indextermunixdef{env}.

Remark This does not include variables you define yourself, unless you \indextermunix{export} them; see below.

Exercise Check on the value of the PATH variable by typing

x=5

echo x

echo $x  You see that the shell treats everything as a string, unless you explicitly tell it to take the value of a variable, by putting a dollar in front of the name. A variable that has not been previously defined will print as a blank string. Shell variables can be set in a number of ways. The simplest is by an assignment as in other programming languages. When you do the next exercise, it is good to bear in mind that the shell is a text based language. Exercise Type a=5 on the commandline. Check on its value with the echo command. Define the variable b to another integer. Check on its value. Now explore the values of a+b and$a+$b , both by echo 'ing them, or by first assigning them. Outcome The shell does not perform integer addition here: instead you get a string with a plus-sign in it. (You will see how to do arithmetic on variables in section \ref{sec:arith-expansion}.) Caution Beware not to have space around the equals sign; also be sure to use the dollar sign to print the value. ### 22.6.2 Exporting variables crumb trail: > unix > Shell environment variables > Exporting variables A variable set this way will be known to all subsequent commands you issue in this shell, but not to commands in new shells you start up. For that you need the \indextermunix{export} command. Reproduce the following session (the square brackets form the command prompt): [] a=20 [] echo$a

20

[] /bin/bash

[] echo $a [] exit exit [] export a=21 [] /bin/bash [] echo$a

21

[] exit



You can also temporarily set a variable. Replay this scenario:

1. [] echo $b []  2. [] cat > echob #!/bin/bash echo$b



and of course make it executable: chmod +x echob .
3. [] b=5 ./echob

5



The syntax where you set the value, as a prefix without using a separate command, sets the value just for that one command.
4. [] echo $b []  That is, you defined the variable just for the execution of a single command. In section \ref{sec:shell-control} you will see that the for construct also defines a variable; section \ref{sec:shell-scripting} shows some more built-in variables that apply in shell scripts. If you want to un-set an environment variable, there is the \indextermunix{unset} command. ## 22.7 Control structures crumb trail: > unix > Control structures (label: sec:shell-control) Like any good programming system, the shell has some control structures. Their syntax takes a bit of getting used to. (Different shells have different syntax; in this tutorial we only discuss the bash shell. ### 22.7.1 Conditionals crumb trail: > unix > Control structures > Conditionals The conditional of the bash shell is predictably called if, and it can be written over several lines: if [$PATH = "" ] ; then

echo "Error: path is empty"

fi



or on a single line:

if [ wc -l file -gt 100 ] ; then echo "file too long" ; fi



(The backquote is explained in section \ref{tut:unix-backquote}.) There are a number of tests defined, for instance -f somefile

tests for the existence of a file. Change your script so that it will report -1 if the file does not exist.

The syntax of this is finicky:

• conditional, followed by a semicolon.
• them.
• they are immediately followed by some command.

Exercise Bash conditionals have an \indextermunix{elif} keyword. Can you predict the error you get from this:

if [ something ] ; then

foo

else if [ something_else ] ; then

bar

fi



Code it out and see if you were right.

### 22.7.2 Looping

crumb trail: > unix > Control structures > Looping

In addition to conditionals, the shell has loops. A  for loop looks like

for var in listofitems ; do

something with $var done  This does the following: • item, and As a simple example: for x in a b c ; do echo$x ; done

a

b

c



In a more meaningful example, here is how you would make backups of all your  .c

files:

for cfile in *.c ; do

cp $cfile$cfile.bak

done



Shell variables can be manipulated in a number of ways. Execute the following commands to see that you can remove trailing characters from a variable:

[] a=b.c

[] echo ${a%.c} b  (See the section \ref{tut:unix-expansion} on expansion.) With this as a hint, write a loop that renames all your .c files to .x files. The above construct loops over words, such as the output of ls . To do a numeric loop, use the command \indextermunixdef{seq}: [shell:474] seq 1 5 1 2 3 4 5  Looping over a sequence of numbers then typically looks like for i in seq 1${HOWMANY} ; do echo $i ; done  Note the \indexterm{backtick}, which is necessary to have the seq command executed before evaluating the loop. ## 22.8 Scripting crumb trail: > unix > Scripting (label: sec:unix-script) The unix shells are also programming environments. You will learn more about this aspect of unix in this section. ### 22.8.1 How to execute scripts crumb trail: > unix > Scripting > How to execute scripts (label: sec:shell-scripting) It is possible to write programs of unix shell commands. First you need to know how to put a program in a file and have it be executed. Make a file script1 containing the following two lines: #!/bin/bash echo "hello world"  and type ./script1 on the command line. Result? Make the file executable and try again. In order write scripts that you want to invoke from anywhere, people typically put them in a directory bin in their home directory. You would then add this directory to your \indexterm{search path}, contained in \indextermtt{PATH}; see section \ref{sec:PATH}. ### 22.8.2 Script arguments crumb trail: > unix > Scripting > Script arguments You can invoke a shell script with options and arguments: ./my_script -a file1 -t -x file2 file3  You will now learn how to incorporate this functionality in your scripts. First of all, all commandline arguments and options are available as variables \verb+$1+, $2 et cetera in the script, and the number of command line arguments is available as$# :

#!/bin/bash

echo "The first argument is $1" echo "There were$# arguments in all"



Formally:

 $\begin{array}{cccccc} \midrule variable meaning \midrule # number of arguments 0 the name of the script 1,2,... the arguments *,@ the list of all arguments \midrule \end{array}$

Exercise Write a script that takes as input a file name argument, and reports how many lines are in that file.

Edit your script to test whether the file has less than 10 lines (use the foo -lt bar test), and if it does, cat the file. Hint: you need to use backquotes inside the test.

Add a test to your script so that it will give a helpful message if you call it without any arguments.

The standard way to parse argument is using the \indextermunixdef{shift} command, which pops the first argument off the list of arguments. Parsing the arguments in sequence then involves looking at $1 , shifting, and looking at the new$1 . \snippetwithoutput{argumentshift}{code/shell}{arguments}

Exercise Write a script say.sh that prints its text argument. However, if you invoke it with

./say.sh -n 7 "Hello world"



it should be print it as many times as you indicated. Using the option -u :

./say.sh -u -n 7 "Goodbye cruel world"



should print the message in uppercase. Make sure that the order of the arguments does not matter, and give an error message for any unrecognized option.

The variables \verb+$@+ and$* have a different behavior with respect to double quotes. Let's say we evaluate myscript "1 2" 3 , then

\item Using $* is the list of arguments after removing quotes: myscript 1 2 3 . • "$*" is the list of arguments, with quotes removed, in quotes: myscript "1 2 3" .
• "$@" preserved quotes: myscript "1 2" 3 . ## 22.9 Expansion crumb trail: > unix > Expansion (label: tut:unix-expansion) The shell performs various kinds of expansion on a command line, that is, replacing part of the commandline with different text. Brace expansion: [] echo a{b,cc,ddd}e abe acce addde  This can for instance be used to delete all extension of some base file name: [] rm tmp.{c,s,o} # delete tmp.c tmp.s tmp.o  Tilde expansion gives your own, or someone else's home directory: [] echo /share/home/00434/eijkhout [] echo eijkhout /share/home/00434/eijkhout  Parameter expansion gives the value of shell variables: [] x=5 [] echo$x

5



Undefined variables do not give an error message:

[] echo $y  There are many variations on parameter expansion. Above you already saw that you can strip trailing characters: [] a=b.c [] echo${a%.c}

b



Here is how you can deal with undefined variables:

[] echo ${y:-0} 0  The backquote mechanism (section \ref{tut:unix-backquote} above) is known as command substitution. It allows you to evaluate part of a command and use it as input for another. For example, if you want to ask what type of file the command ls is, do [] file which ls  This first evaluates which ls , giving /bin/ls , and then evaluates file /bin/ls . As another example, here we backquote a whole pipeline, and do a test on the result: [] echo 123 > w [] cat w 123 [] wc -c w 4 w [] if [ cat w | wc -c -eq 4 ] ; then echo four ; fi four  ### 22.9.1 Arithmetic expansion crumb trail: > unix > Expansion > Arithmetic expansion (label: sec:arith-expansion) Unix shell programming is very much oriented towards text manipulation, but it is possible to do arithmetic. Arithmetic substitution tells the shell to treat the expansion of a parameter as a number: [] x=1 [] echo$((x*2))

2



Integer ranges can be used as follows:

[] for i in {1..10} ; do echo $i ; done 1 2 3 4 5 6 7 8 9 10  ## 22.10 Startup files crumb trail: > unix > Startup files In this tutorial you have seen several mechanisms for customizing the behavior of your shell. For instance, by setting the PATH variable you can extend the locations where the shell looks for executables. Other environment variables (section \ref{tut:shellvars}) you can introduce for your own purposes. Many of these customizations will need to apply to every sessions, so you can have shell startup files Popular things to do in a startup file are defining \indextermunix{alias}es: alias grep='grep -i' alias ls='ls -F'  and setting a custom commandline \indexterm{prompt}. Unfortunately, there are several startup files, and which one gets read is a complicated functions of circumstances. Here is a good common sense guideline\footnote{Many thanks to Robert McLay for figuring this out.}: • # /.profile if [ -f /.bashrc ]; then source /.bashrc fi  • # /.bashrc # make sure your path is updated if [ -z "$MYPATH" ]; then

export MYPATH=1

export PATH=$HOME/bin:$PATH

fi



## 22.11 Shell interaction

crumb trail: > unix > Shell interaction

Interactive use of Unix, in contrast to script writing (section \ref{sec:unix-script}), is a complicated conversation between the user and the shell. You, the user, type a line, hit return, and the shell tries to interpret it. There are several cases.

• shell will execute this command.
• semicolons: mkdir foo; cd foo . The shell will execute these commands in sequence.
• 1]}. The shell will recognize that there is more to come, and use a different prompt to show you that it is waiting for the remainder of the command.
• type more on a second line. In that case you can end your input line with a backslash character, and the shell will recognize that it needs to hold off on executing your command. In effect, the backslash will hide (\indexterm{escape}) the return.

When the shell has collected a command line to execute, by using one or more of your input line or only part of one, as described just now, it will apply expansion to the command line (section \ref{tut:unix-expansion}). It will then interpret the commandline as a command and arguments, and proceed to invoke that command with the arguments as found.

There are some subtleties here. If you type ls *.c , then the shell will recognize the wildcard character and expand it to a command line, for instance ls foo.c bar.c . Then it will invoke the ls

command with the argument list foo.c bar.c . Note that ls does not receive *.c as argument! In cases where you do want the unix command to receive an argument with a wildcard, you need to escape it so that the shell will not expand it. For instance, \n{find . -name \\*.c} will make the shell invoke find with arguments \n{. -name *.c}.

## 22.12 The system and other users

crumb trail: > unix > The system and other users

(label: sec:users)

Unix is a multi-user operating system. Thus, even if you use it on your own personal machine, you are a user with an account \index{Unix!user account} and you may occasionally have to type in your username and password.

If you are on your personal machine, you may be the only user logged in. On university machines or other servers, there will often be other users. Here are some commands relating to them.

• you can specify a user's login name here, or their real name, or other identifying information the system knows about.
• top -u to get this sorted the amount of cpu time they are currently taking. (On Linux, try also the vmstat command.)

### 22.12.1 Groups

crumb trail: > unix > The system and other users > Groups

In section \ref{sec:unix-permissions} you saw that there is a permissions category for group'. This allows you to open up files to your close collaborators, while leaving them protected from the wide world.

The command \indextermunix{groups} tells you all the groups you are in, and ls -l tells you what group a file belongs to. Analogous to chmod , you can use \indextermunix{chgrp} to change the group to which a file belongs, to share it with a user who is also in that group.

Creating a new group, or adding a user to a group needs system privileges. To create a group:

sudo groupadd new_group_name



To add a user to a group:

sudo usermod -a -G thegroup theuser



### 22.12.2 The super user

crumb trail: > unix > The system and other users > The super user

Even if you own your machine, there are good reasons to work as much as possible from a regular user account, and use root privileges (The root account is also known as the \indextermsubdef{super}{user}.) If you have root privileges, you can also use that to become another user' and do things with their privileges, with the sudo (superuser do') command.

• sudo -u otheruser command arguments



• sudo command arguments



• sudo su - otheruser



• sudo su -



## 22.13 Other systems: ssh and scp

crumb trail: > unix > Other systems: ssh and scp

No man is an island, and no computer is either. Sometimes you want to use one computer, for instance your laptop, to connect to another, for instance a supercomputer.

If you are already on a Unix computer, you can log into another with the secure shell' command \indextermunixdef{ssh}, a more secure variant of the old remote shell' command \indextermunixdef{rsh}:

ssh yourname@othermachine.otheruniversity.edu



where the yourname can be omitted if you have the same name on both machines.

To only copy a file from one machine to another you can use the secure copy' \indextermunixdef{scp}, a secure variant of remote copy' \indextermunixdef{rcp}. The scp command is much like cp in syntax, except that the source or destination can have a machine prefix.

To copy a file from the current machine to another, type:

scp localfile yourname@othercomputer:otherdirectory



where yourname can again be omitted, and otherdirectory can be an absolute path, or a path relative to your home directory:

# absolute path:

scp localfile yourname@othercomputer:/share/

# path relative to your home directory:

scp localfile yourname@othercomputer:mysubdirectory



Leaving the destination path empty puts the file in the remote home directory:

scp localfile yourname@othercomputer:



Note the colon at the end of this command: if you leave it out you get a local file with an at' in the name.

You can also copy a file from the remote machine. For instance, to copy a file, preserving the name:

scp yourname@othercomputer:otherdirectory/otherfile .



## 22.14 The sed and awk tools

crumb trail: > unix > The sed and awk tools

Apart from fairly small utilities such as tr and cut , Unix has some more powerful tools. In this section you will see two tools for line-by-line transformations on text files. Of course this tutorial merely touches on the depth of these tools; for more information see \cite{AWK:awk,OReilly:sedawk}.

### 22.14.1 Stream editing with sed

crumb trail: > unix > The sed and awk tools > Stream editing with sed

Unix has various tools for processing text files on a line-by-line basis. The stream editor \indextermunix{sed} is one example. If you have used the

vi editor, you are probably used to a syntax like s/foo/bar/ for making changes. With sed , you can do this on the commandline. For instance

sed 's/foo/bar/' myfile > mynewfile



will apply the substitute command s/foo/bar/ to every line of

myfile . The output is shown on your screen so you should capture it in a new file; see section \ref{sec:unixpipe} for more on output redirection.

• sed -e 's/one/two/' -e 's/three/four/'



• specify that by prefixing the edit with the match string. For instance

sed '/^a/s/b/c/'



only applies the edit on lines that start with an  a . (See section \ref{sec:regexp} for regular expressions.)
• the output file always had to be different from the input. The GNU version, which is standard on Linux systems, has a flag -i which edits in place':

sed -e 's/ab/cd/' -e 's/ef/gh/' -i thefile



### 22.14.2 \tt awk

crumb trail: > unix > The sed and awk tools > \tt awk

The \indextermunixdef{awk} utility also operates on each line, but it can be described as having a memory. An awk program consists of a sequence of pairs, where each pair consists of a match string and an action. The simplest awk program is

cat somefile | awk '{ print }'



where the match string is omitted, meaning that all lines match, and the action is to print the line. Awk breaks each line into fields separated by whitespace. A common application of awk is to print a certain field:

awk '{print $2}' file  prints the second field of each line. Suppose you want to print all subroutines in a Fortran program; this can be accomplished with awk '/subroutine/ {print}' yourfile.f  Exercise Build a command pipeline that prints of each subroutine header only the subroutine name. For this you first use sed to replace the parentheses by spaces, then awk to print the subroutine name field. Awk has variables with which it can remember things. For instance, instead of just printing the second field of every line, you can make a list of them and print that later: cat myfile | awk 'BEGIN {v="Fields:"} {v=v " "$2} END {print v}'



As another example of the use of variables, here is how you would print all lines in between a BEGIN and END line:

cat myfile | awk '/END/ {p=0} p==1 {print} /BEGIN/ {p=1} '



Exercise The placement of the match with BEGIN and END may seem strange. Rearrange the awk program, test it out, and explain the results you get.

## 22.15 Review questions

crumb trail: > unix > Review questions

\begin{istc}

Exercise Devise a pipeline that counts how many users are logged onto the system, whose name starts with a vowel and ends with a consonant.

\end{istc}

Exercise \label{tut:ex:plagiarism} Pretend that you're a professor writing a script for homework submission: if a student invokes this script it copies the student file to some standard location.

submit_homework myfile.txt



For simplicity, we simulate this by making a directory

submissions and two different files student1.txt and

student2.txt . After

submit_homework student1.txt

submit_homework student2.txt



there should be copies of both files in the submissions

directory. Start by writing a simple script; it should give a helpful message if you use it the wrong way.

Try to detect if a student is cheating. Explore the \indextermunix{diff} command to see if the submitted file is identical to something already submitted: loop over all submitted files and

Now refine your test by catching if the cheating student randomly inserted some spaces.

For a harder test: try to detect whether the cheating student inserted newlines. This can not be done with \indextermunix{diff}, but you could try \indextermunix{tr} to remove the newlines.