Machine Learning @ Coursera

Octave Tutorial

This cheatsheet wants to provide an overview of the concepts and the used formulas and definitions of the »Machine Learning« online course at coursera.

This very document is a script of the octave tutorial that was part of the second week. The regular cheat sheet is available here.

Last changed:

Octave Tutorial

Basic Operations

Elementary Operations
5 + 6 % addition
3 - 2 % subtraction
5 * 7 % multiplication
1 / 2 % division
2^6 % power
            
Logical Operations
1 == 2 % equality
1 ~= 2 % nonequality
1 && 0 % logical and
1 || 0 % or
xor(1,0) % xor
Prompt

The prompt can be modified by PS1('>> ').

Working with variables
Assignments
a = 3
a = 3; % comma supresses the output
b = 'hi';
c = (3 >= 1);
Printing
a = pi;
a % => 3.1416

disp(a) % disp is used for more complex values

% formatted output
disp(sprintf('2 decimals: %0.2f', a))
% => 2 decimals: 3.14
disp(sprintf('6 decimals: %0.6f', a))
% => 2 decimals: 3.141593

format long
a % => 3.14159265358979

format short
a % => 3.1416
Matrixes and Vectors
A = [1 2; 3 4; 5 6]

v = [1 2 3] % row vector
v = [1; 2; 3] % column vector
Ranges
% ranges
% elements range from 1.0, 1.1 ... to 2.0
v = 1:0.1:2

% (elements range from one to six)
v = 1:6
Zero and One filled Matrixes
% matrix filled with ones
one(2,3)
C = 2 * ones(2,3)

% would be the same as:
C  = [2 2 2; 2 2 2]

W = ones(1,3) % row vector of three ones
W = zeros(1,3) % same with zeros
Matrixes with Random Values
rand(3,3) % 3 by 3 matrix with random values from [0, 1]

W = randn(1,3) % random values from a gaussian distribution

w = -6 + sqrt(10)*(randn(1,10000));
% vector of 10k elements

% you can view the distribution of the 
% values with a histogram
hist(w) % histogram of values of w

hist(w,50) % same but with 50 buckets
Identity Matrixes
eye(4) % 4 by 4 identity matrix
Help

By typing help eye (or another function) help for this function is shown.

Moving Data Around

Directories and Files

pwd prints the current working directory, and ls lists all files and directories in the current working directory.

With load featuresX.dat loads the file as featuresX. who lists all the variables in the current working space. For a more detailed view of all the variables you can use whos. To clear a specific value you can use clear featuresX.

By using save hello.mat v; the content of the vector v gets saved to hello.mat in a compressed binary format. To save it in a human readable format use save hello.txt v -ascii.

Matrix Sizing
A = [1 2; 3 4; 5 6]
size(A) % matrix of [2 3]
size(A,1) % => 3
size(A,2) % => 2

v = [1 2 3 4]
length(v) % => 4
length([1;2;3;4;5]) % => 5
Access Elements in a Matrix or Vector
A = [1 2; 3 4; 5 6]
A(1,2) % => 2

A(1,:) % everything from the first row

A(:,2) % everything from the second column

A([1 3], :) % everything from the first and third row

A(:,2) = [10; 11; 12] % replaces the second column with the values 10,11,12
Appending and Concatenating Matrixes and Vectors
A = [A, [100; 200; 300]]; % append another column on the right
A = [1 2; 3 4; 5 6];
B = [10 11; 12 13; 14 15];

C = [A B]; % combine A and B to a new matrix (with 4 columns and 3 rows)
C = [A,B]; % the same

C = [A;B]; % combine A and B (with 2 colums and 6 rows)
Special Syntax: Put all Values of a Matrix in A Single Vector
A(:)

Computation on Data

With given matrixes

A = [1 2; 3 4; 5 6]
B = 11 12; 13 14; 15 16]
C = [1 1; 2 2]

Matrixmultiplication works as follows:

A * C

Now to multiply element wise:

A .* B

The dot denotes element wise operations, for example element wise squaring:

A .^ 2

With a vector $\vec v$ the element wise reciprocal (or inverse):

v = [1; 2; 3]
1 ./ v

% or for a matrix
1 ./ A
log(A) % element wise logarithm
exp(v) % element wise base E exponantiation
abs([-1; -2; -3]) % element wise absolute value
-v % element wise negation
v + ones(length(v), 1) % increment elements of v by one

% or easier
v + 1

Transposing a matrix:

A'
a = [1 15 2 0.5]
val = max(a) % val = 15

[val, ind] = max(a) % value and index of the maximum element
a < 3 % element wise lower than

To find all values that match a specific criteria one could use the find function:

find(a < 3)

Sum or product of all elements of a matrix or vector:

sum(a)
prod(a)

Round up or down all elements:

floor(a)
ceil(a)

Element wise maximum of two random 3 by 3 matrixes:

max(rand(3), rand(3))

Now for column/row wise maximum of a matrix:

max(A, [], 1) % maximum of first dimension
max(A, [], 2) % maximum of second dimension

Overall maximum of a matrix:

max(max(A))

% or turn A into a vector
max(A(:))

Per column sum:

A = magic(9)
sum(A. 1) % sum of first dimension
sum(A, 2) % sum of second dimension

Diagonal sum:

sum(sum(A .* eye(9))) % first eliminate all non-diagonal elements, then sum
sum(sum(A .* flipup(eye(9)))) % other diagonal

Permutation matrix:

A = magic(3)
pinv(A) % (pseudo-) inverse of A

Plotting Data

Please note that the produced SVG can be problematic and might not be displayed correctly.

To review your code and visualize progress, Octave can easily plot your data:

% example data
t = [0:0.01:0.98];

y1 = sin(2*pi*4*t);
plot(t, y1);

Which gives the following plot:

first example plot
Plot of vector t and sinus of t
y2 = cos(2*pi*4*t);
plot(t, y2);

Which yields this plot:

second example plot
Plot of vector t and cosine of t

Now to have an overlay of both plots, the hold on command prevents gnuplot from erasing the first plot when plotting the second.

plot(t, y1);
hold on
plot(t, y2, 'r');
xlabel('time')
ylabel('value')
legend('sin', 'cos')
title('my plot')

Which results in the following plot:

first and second example plot
Plot of vector t and cosine of t with the plot of vector t and its cosine

You can save this plot via cd $directory_to_save; print -dpng 'myPlot.png'.

And finally to exit the plot in a clean manner one would use close.


To simultaniously work with several plots octave can assign them to a figure:

figure(1); plot(t, y1);
figure(2); plot(t, y2);

Or alternatively the plot can be split up into areas holding multiple plots:

subplot(1, 2, 1); %  divide plot into a 1x2 grid, access first element
plot(t, y1);
subplot(1,2,2);
plot(t, y2);
axis([0.5 1 -1 1])
subplot example
Plot of vector t and cosine of t with the plot of vector t and its cosine shown in one plot by dividing it into areas

Plots can also be used to visualize matrixes:

A = magic(5)
imagesc(A)

Which produces this plot:

imagesc example
Plot of a magic matrix with size 5

An easier plot for this however could be imagesc(A), colorbar, colormap gray;, which yields the following plot:

second imagesc example
Plot of a magic matrix with size 5 in gray scale

Control Statements: for, while, if statements

For Statement
v = zeros(10,1)
for i=1:10,
    v(i) = 2^i
end;

While Statement

i = 1;
while i <= 5,
    v(i) = 100;
    i = i+1;
end;

Break

i = 1;
while true,
    v(i) = 999;
    i = i+1;

    if i == 6,
        break;
    end;
end;

If Statement

v(1) = 2;
if v(1) == 1,
    disp('The value is one');
elseif v(1) == 2,
    disp('The value is two');
else
    disp('The value is not one or two');
end;

Function Statements

Octave functions are defined in a file that has to be positioned in your cwd. Take a file with the following function for example:

function y = squareThisNumber(x)

y = x^2;

If octave's current working directory is the same as the directory where an octave file with this function exists, then you can use this function without any hassle in your octave instance.

Alternatively you could add the directory where the file with the function is to the octave search path:

addpath($absolute_path)

Octave allows us to define a function that returns multiple values:

function [y1, y2] = squareAndCubeThisNumber(x)

y1 = x^2;
y2 = x^3;

Now for a more complex example. The cost function $J(\vec \theta)$ is defined as follows in a seperate file:

function J = cstFunction(X, y, theta)

% X is the "design matrix" containing our training examples
% y is the class labels

m = size(X,1);              % number of training examples
predictions = X*theta;      % predictions of hypothesis on
                            % all m examples
sqrErrors = (predictions-y).^2; % squared errors

J = 1/(2*m) * sum(sqrErrors);

To compute the error for a given dataset:

X = [1 1; 1 2; 1 3];
y = [1; 2; 3];
theta = [0;1];
j = costFunctionJ(X,y,theta)

Vectorization

The hypothesis $h_\theta (x) = \sum^n_{j=0} \theta_j x_j$ can also be written as $\vec \theta^T \vec x$. With that you'd only need to compute the product of two vectors instead of the sum. Take this unvectorized hypothesis calculation for example:

prediction = 0.0;
for j = 1:n+1,
    prediction = prediction + theta(j) * x(j)
end;

For contrast look at the vectorized implementation:

prediction = theta' * x;

Now for a more sophisticated example the gradient descent algorithm (for all j):

$$\theta_j := \theta_j - \alpha \frac1{1}{m}\sum^{m}_{i=0}\left(h_\theta(x^{(i)}) - y^{(i)}\right)x^{(i)}_j\\$$

The vectorized implementation consists of $\theta := \theta - \alpha \delta$ where delta is a vector of the deltas:

$$\delta = \frac{1}{m} \sum^m_{i=1} \left( h_\theta(x^{(i)} - y^{(i)}\right)x^{(i)}$$