Machine Learning @ Coursera
Octave Tutorial
This cheatsheet wants to provide an overview of the concepts and the used formulas and definitions of the »Machine Learning« online course at coursera.
This very document is a script of the octave tutorial that was part of the second week. The regular cheat sheet is available here.
Last changed:
Octave Tutorial
Basic Operations
Elementary Operations
5 + 6 % addition
3 - 2 % subtraction
5 * 7 % multiplication
1 / 2 % division
2^6 % power
Logical Operations
1 == 2 % equality
1 ~= 2 % nonequality
1 && 0 % logical and
1 || 0 % or
xor(1,0) % xor
Prompt
The prompt can be modified by PS1('>> ')
.
Working with variables
Assignments
a = 3
a = 3; % comma supresses the output
b = 'hi';
c = (3 >= 1);
Printing
a = pi;
a % => 3.1416
disp(a) % disp is used for more complex values
% formatted output
disp(sprintf('2 decimals: %0.2f', a))
% => 2 decimals: 3.14
disp(sprintf('6 decimals: %0.6f', a))
% => 2 decimals: 3.141593
format long
a % => 3.14159265358979
format short
a % => 3.1416
Matrixes and Vectors
A = [1 2; 3 4; 5 6]
v = [1 2 3] % row vector
v = [1; 2; 3] % column vector
Ranges
% ranges
% elements range from 1.0, 1.1 ... to 2.0
v = 1:0.1:2
% (elements range from one to six)
v = 1:6
Zero and One filled Matrixes
% matrix filled with ones
one(2,3)
C = 2 * ones(2,3)
% would be the same as:
C = [2 2 2; 2 2 2]
W = ones(1,3) % row vector of three ones
W = zeros(1,3) % same with zeros
Matrixes with Random Values
rand(3,3) % 3 by 3 matrix with random values from [0, 1]
W = randn(1,3) % random values from a gaussian distribution
w = -6 + sqrt(10)*(randn(1,10000));
% vector of 10k elements
% you can view the distribution of the
% values with a histogram
hist(w) % histogram of values of w
hist(w,50) % same but with 50 buckets
Identity Matrixes
eye(4) % 4 by 4 identity matrix
Help
By typing help eye
(or another function) help for this function is shown.
Moving Data Around
Directories and Files
pwd
prints the current working directory, and ls
lists all files and directories in the current working directory.
With load featuresX.dat
loads the file as featuresX
.
who
lists all the variables in the current working space.
For a more detailed view of all the variables you can use whos
.
To clear a specific value you can use clear featuresX
.
By using save hello.mat v;
the content of the vector v
gets saved to hello.mat
in a compressed binary format. To save it in a
human readable format use save hello.txt v -ascii
.
Matrix Sizing
A = [1 2; 3 4; 5 6]
size(A) % matrix of [2 3]
size(A,1) % => 3
size(A,2) % => 2
v = [1 2 3 4]
length(v) % => 4
length([1;2;3;4;5]) % => 5
Access Elements in a Matrix or Vector
A = [1 2; 3 4; 5 6]
A(1,2) % => 2
A(1,:) % everything from the first row
A(:,2) % everything from the second column
A([1 3], :) % everything from the first and third row
A(:,2) = [10; 11; 12] % replaces the second column with the values 10,11,12
Appending and Concatenating Matrixes and Vectors
A = [A, [100; 200; 300]]; % append another column on the right
A = [1 2; 3 4; 5 6];
B = [10 11; 12 13; 14 15];
C = [A B]; % combine A and B to a new matrix (with 4 columns and 3 rows)
C = [A,B]; % the same
C = [A;B]; % combine A and B (with 2 colums and 6 rows)
Special Syntax: Put all Values of a Matrix in A Single Vector
A(:)
Computation on Data
With given matrixes
A = [1 2; 3 4; 5 6]
B = 11 12; 13 14; 15 16]
C = [1 1; 2 2]
Matrixmultiplication works as follows:
A * C
Now to multiply element wise:
A .* B
The dot denotes element wise operations, for example element wise squaring:
A .^ 2
With a vector $\vec v$ the element wise reciprocal (or inverse):
v = [1; 2; 3]
1 ./ v
% or for a matrix
1 ./ A
log(A) % element wise logarithm
exp(v) % element wise base E exponantiation
abs([-1; -2; -3]) % element wise absolute value
-v % element wise negation
v + ones(length(v), 1) % increment elements of v by one
% or easier
v + 1
Transposing a matrix:
A'
a = [1 15 2 0.5]
val = max(a) % val = 15
[val, ind] = max(a) % value and index of the maximum element
a < 3 % element wise lower than
To find all values that match a specific criteria one could use the find
function:
find(a < 3)
Sum or product of all elements of a matrix or vector:
sum(a)
prod(a)
Round up or down all elements:
floor(a)
ceil(a)
Element wise maximum of two random 3 by 3 matrixes:
max(rand(3), rand(3))
Now for column/row wise maximum of a matrix:
max(A, [], 1) % maximum of first dimension
max(A, [], 2) % maximum of second dimension
Overall maximum of a matrix:
max(max(A))
% or turn A into a vector
max(A(:))
Per column sum:
A = magic(9)
sum(A. 1) % sum of first dimension
sum(A, 2) % sum of second dimension
Diagonal sum:
sum(sum(A .* eye(9))) % first eliminate all non-diagonal elements, then sum
sum(sum(A .* flipup(eye(9)))) % other diagonal
Permutation matrix:
A = magic(3)
pinv(A) % (pseudo-) inverse of A
Plotting Data
Please note that the produced SVG
can be problematic and might not
be displayed correctly.
To review your code and visualize progress, Octave can easily plot your data:
% example data
t = [0:0.01:0.98];
y1 = sin(2*pi*4*t);
plot(t, y1);
Which gives the following plot:
y2 = cos(2*pi*4*t);
plot(t, y2);
Which yields this plot:
Now to have an overlay of both plots, the hold on
command prevents
gnuplot from erasing the first plot when plotting the second.
plot(t, y1);
hold on
plot(t, y2, 'r');
xlabel('time')
ylabel('value')
legend('sin', 'cos')
title('my plot')
Which results in the following plot:
You can save this plot via cd $directory_to_save; print -dpng 'myPlot.png'
.
And finally to exit the plot in a clean manner one would use close
.
To simultaniously work with several plots octave can assign them to a figure:
figure(1); plot(t, y1);
figure(2); plot(t, y2);
Or alternatively the plot can be split up into areas holding multiple plots:
subplot(1, 2, 1); % divide plot into a 1x2 grid, access first element
plot(t, y1);
subplot(1,2,2);
plot(t, y2);
axis([0.5 1 -1 1])
Plots can also be used to visualize matrixes:
A = magic(5)
imagesc(A)
Which produces this plot:
An easier plot for this however could be imagesc(A), colorbar, colormap gray;
, which yields the following plot:
Control Statements: for, while, if statements
For Statement
v = zeros(10,1)
for i=1:10,
v(i) = 2^i
end;
While Statement
i = 1;
while i <= 5,
v(i) = 100;
i = i+1;
end;
Break
i = 1;
while true,
v(i) = 999;
i = i+1;
if i == 6,
break;
end;
end;
If Statement
v(1) = 2;
if v(1) == 1,
disp('The value is one');
elseif v(1) == 2,
disp('The value is two');
else
disp('The value is not one or two');
end;
Function Statements
Octave functions are defined in a file that has to be positioned in your cwd
. Take a file with the following function for example:
function y = squareThisNumber(x)
y = x^2;
If octave's current working directory is the same as the directory where an octave file with this function exists, then you can use this function without any hassle in your octave instance.
Alternatively you could add the directory where the file with the function is to the octave search path:
addpath($absolute_path)
Octave allows us to define a function that returns multiple values:
function [y1, y2] = squareAndCubeThisNumber(x)
y1 = x^2;
y2 = x^3;
Now for a more complex example. The cost function $J(\vec \theta)$ is defined as follows in a seperate file:
function J = cstFunction(X, y, theta)
% X is the "design matrix" containing our training examples
% y is the class labels
m = size(X,1); % number of training examples
predictions = X*theta; % predictions of hypothesis on
% all m examples
sqrErrors = (predictions-y).^2; % squared errors
J = 1/(2*m) * sum(sqrErrors);
To compute the error for a given dataset:
X = [1 1; 1 2; 1 3];
y = [1; 2; 3];
theta = [0;1];
j = costFunctionJ(X,y,theta)
Vectorization
The hypothesis $h_\theta (x) = \sum^n_{j=0} \theta_j x_j$ can also be written as $\vec \theta^T \vec x$. With that you'd only need to compute the product of two vectors instead of the sum. Take this unvectorized hypothesis calculation for example:
prediction = 0.0;
for j = 1:n+1,
prediction = prediction + theta(j) * x(j)
end;
For contrast look at the vectorized implementation:
prediction = theta' * x;
Now for a more sophisticated example the gradient descent algorithm (for all j
):
The vectorized implementation consists of $\theta := \theta - \alpha \delta$ where delta is a vector of the deltas:
$$\delta = \frac{1}{m} \sum^m_{i=1} \left( h_\theta(x^{(i)} - y^{(i)}\right)x^{(i)}$$