Supervised Classification

Every classifier must be initialized with a specific set of parameters. Two distinct methods are deployed for the training (compute()) and the testing (predict()) phases. Whenever possible, the real valued prediction is stored in the realpred variable.

Support Vector Machines (SVMs)

class mlpy.Svm(kernel='linear', kp=0.1, C=1.0, tol=0.001, eps=0.001, maxloops=1000, cost=0.0, alpha_tversky=1.0, beta_tversky=1.0, opt_offset=True)

Support Vector Machines (SVM).

Example :
>>> import numpy as np
>>> import mlpy
>>> xtr = np.array([[1.0, 2.0, 3.0, 1.0],  # first sample
...                 [1.0, 2.0, 3.0, 2.0],  # second sample
...                 [1.0, 2.0, 3.0, 1.0]]) # third sample
>>> ytr = np.array([1, -1, 1])             # classes
>>> mysvm = mlpy.Svm()                     # initialize Svm class
>>> mysvm.compute(xtr, ytr)                # compute SVM
1
>>> mysvm.predict(xtr)                     # predict SVM model on training data
array([ 1, -1,  1])
>>> xts = np.array([4.0, 5.0, 6.0, 7.0])   # test point
>>> mysvm.predict(xts)                     # predict SVM model on test point
-1
>>> mysvm.realpred                         # real-valued prediction
-5.5
>>> mysvm.weights(xtr, ytr)                # compute weights on training data
array([ 0.,  0.,  0.,  1.])

Initialize the Svm class

Parameters :
kernel : string [‘linear’, ‘gaussian’, ‘polynomial’, ‘tr’, ‘tversky’]

kernel

kp : float

kernel parameter (two sigma squared) for gaussian and polynomial kernel

C : float

regularization parameter

tol : float

tolerance for testing KKT conditions

eps : float

convergence parameter

maxloops : integer

maximum number of optimization loops

cost : float [-1.0, ..., 1.0]

for cost-sensitive classification

alpha_tversky : float

positive multiplicative parameter for the norm of the first vector

beta_tversky : float

positive multiplicative parameter for the norm of the second vector

opt_offset : bool

compute the optimal offset

compute(x, y)

Compute SVM model

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :
conv : integer

svm convergence (0: false, 1: true)

predict(p)

Predict svm model on a test point(s)

Parameters :
p : 1d or 2d ndarray float (samples x feats)

test point(s)training dataInput

Returns :
cl : integer or 1d ndarray integer

class(es) predicted

Attributes :
Svm.realpred : float or 1d ndarray float

real valued prediction

weights(x, y)

Return feature weights

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :
fw : 1d ndarray float

feature weights

Note

For tr kernel (Terminated Ramp Kernel) see [Merler06].

K Nearest Neighbor (KNN)

class mlpy.Knn(k, dist='se')

k-Nearest Neighbor (KNN).

Example:

>>> import numpy as np
>>> import mlpy
>>> xtr = np.array([[1.0, 2.0, 3.1, 1.0],  # first sample
...                 [1.0, 2.0, 3.0, 2.0],  # second sample
...                 [1.0, 2.0, 3.1, 1.0]]) # third sample
>>> ytr = np.array([1, -1, 1])             # classes
>>> myknn = mlpy.Knn(k = 1)                # initialize knn class
>>> myknn.compute(xtr, ytr)              # compute knn
1
>>> myknn.predict(xtr)                   # predict knn model on training data
array([ 1, -1,  1])
>>> xts = np.array([4.0, 5.0, 6.0, 7.0]) # test point
>>> myknn.predict(xts)                   # predict knn model on test point
-1
>>> myknn.realpred                       # real-valued prediction
0.0

Initialize the Knn class.

Parameters :
k : int (odd > = 1)

number of NN

dist : string (‘se’ = SQUARED EUCLIDEAN, ‘e’ = EUCLIDEAN)

adopted distance

compute(x, y)

Store x and y data.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1 for binary classification)

: 1d ndarray integer (1, ..., nclasses for multiclass classificatio) classes

Returns :

1

Raises :
ValueError

if not (1 <= k <= #samples)

ValueError

if there aren’e at least 2 classes

ValueError

if, in case of 2-classes problems, the lables are not 1 and -1

ValueError

if, in case of n-classes problems, the lables are not int from 1 to n

predict(p)

Predict knn model on a test point(s).

Parameters :
p : 1d or 2d ndarray float (sample(s) x feats)

test sample(s)

Returns :

the predicted value(s) on success: integer or 1d numpy array integer (-1 or 1) for binary classification integer or 1d numpy array integer (1, ..., nclasses) for multiclass classification 0 on succes with non unique classification -2 otherwise

Raises :
StandardError

if no Knn method computed

Fisher Discriminant Analysis (FDA)

Described in [Mika01].

class mlpy.Fda(C=1)

Fisher Discriminant Analysis.

Example:

>>> import numpy as np
>>> import mlpy
>>> xtr = np.array([[1.0, 2.0, 3.1, 1.0],  # first sample
...                 [1.0, 2.0, 3.0, 2.0],  # second sample
...                 [1.0, 2.0, 3.1, 1.0]]) # third sample
>>> ytr = np.array([1, -1, 1])             # classes
>>> myfda = mlpy.Fda()                   # initialize fda class
>>> myfda.compute(xtr, ytr)              # compute fda
1
>>> myfda.predict(xtr)                   # predict fda model on training data
array([ 1, -1,  1])
>>> xts = np.array([4.0, 5.0, 6.0, 7.0]) # test point
>>> myfda.predict(xts)                   # predict fda model on test point
-1
>>> myfda.realpred                       # real-valued prediction
-42.51475717037367
>>> myfda.weights(xtr, ytr)              # compute weights on training data
array([  9.60629896,   9.77148463,   9.82027615,  11.58765243])

Initialize Fda class.

Parameters :
C : float

regularization parameter

compute(x, y)

Compute fda model.

Parameters :
x : 2d numpy array float (sample x feature)

training data

y : 1d numpy array integer (two classes, 1 or -1)

classes

Returns :

1

predict(p)

Predict fda model on test point(s).

Parameters :
p : 1d or 2d ndarray float (sample(s) x feats)

test sample(s)

Returns :
cl : integer or 1d numpy array integer

class(es) predicted

Attributes :
self.realpred : float or 1d numpy array float

real valued prediction

weights(x, y)

Return feature weights.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :
fw : 1d ndarray float

feature weights

Spectral Regression Discriminant Analysis (SRDA)

Described in [Cai08].

class mlpy.Srda(alpha=1.0)

Spectral Regression Discriminant Analysis (SRDA).

Example:

>>> import numpy as np
>>> import mlpy
>>> xtr = np.array([[1.0, 2.0, 3.1, 1.0],  # first sample
...                 [1.0, 2.0, 3.0, 2.0],  # second sample
...                 [1.0, 2.0, 3.1, 1.0]]) # third sample
>>> ytr = np.array([1, -1, 1])             # classes
>>> mysrda = mlpy.Srda()                 # initialize srda class
>>> mysrda.compute(xtr, ytr)             # compute srda
1
>>> mysrda.predict(xtr)                  # predict srda model on training data
array([ 1, -1,  1])
>>> xts = np.array([4.0, 5.0, 6.0, 7.0]) # test point
>>> mysrda.predict(xts)                  # predict srda model on test point
-1
>>> mysrda.realpred                      # real-valued prediction
-6.8283034257748758
>>> mysrda.weights(xtr, ytr)             # compute weights on training data
array([ 0.10766721,  0.21533442,  0.51386623,  1.69331158])

Initialize the Srda class.

Parameters :
alpha : float(>=0.0)

regularization parameter

compute(x, y)
Compute Srda model.
Initialize array of alphas a.
Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :

1

Raises :
LinAlgError

if x is singular matrix in __PenRegrModel

predict(p)

Predict Srda model on test point(s).

Parameters :
p : 1d or 2d ndarray float (sample(s) x feats)

test sample(s)

Returns :
cl : integer or 1d numpy array integer

class(es) predicted

Attributes :
self.realpred : float or 1d numpy array float

real valued prediction

weights(x, y)

Return feature weights.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :
fw : 1d ndarray float

feature weights

Penalized Discriminant Analysis (PDA)

Described in [Ghosh03].

class mlpy.Pda(Nreg=3)

Penalized Discriminant Analysis (PDA).

Example:

>>> import numpy as np
>>> import mlpy
>>> xtr = np.array([[1.0, 2.0, 3.1, 1.0],  # first sample
...                 [1.0, 2.0, 3.0, 2.0],  # second sample
...                 [1.0, 2.0, 3.1, 1.0]]) # third sample
>>> ytr = np.array([1, -1, 1])             # classes
>>> mypda = mlpy.Pda()                   # initialize pda class
>>> mypda.compute(xtr, ytr)              # compute pda
1
>>> mypda.predict(xtr)                   # predict pda model on training data
array([ 1, -1,  1])
>>> xts = np.array([4.0, 5.0, 6.0, 7.0]) # test point
>>> mypda.predict(xts)                   # predict pda model on test point
-1
>>> mypda.realpred                       # real-valued prediction
-7.6106885609535624
>>> mypda.weights(xtr, ytr)              # compute weights on training data
array([  4.0468174 ,   8.0936348 ,  18.79228266,  58.42466988])

Initialize Pda class.

Parameters :
Nreg : int

number of regressions

compute(x, y)

Compute Pda model.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :

1

Raises :
LinAlgError

if x is singular matrix in __PenRegrModel

predict(p)

Predict Pda model on test point(s).

Parameters :
p : 1d or 2d ndarray float (sample(s) x feats)

test sample(s)

Returns :
cl : integer or 1d numpy array integer

class(es) predicted

Attributes :
self.realpred : float or 1d numpy array float

real valued prediction

weights(x, y)

Compute feature weights.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :
fw : 1d ndarray float

feature weights

Diagonal Linear Discriminant Analysis (DLDA)

class mlpy.Dlda(nf=0, tol=10, overview=False, bal=False)

Diagonal Linear Discriminant Analysis.

Example:

>>> from numpy import *
>>> from mlpy import *
>>> xtr = array([[1.1, 2.4, 3.1, 1.0],  # first sample
...              [1.2, 2.3, 3.0, 2.0],  # second sample
...              [1.3, 2.2, 3.5, 1.0],  # third sample
...              [1.4, 2.1, 3.2, 2.0]]) # fourth sample
>>> ytr = array([1, -1, 1, -1])         # classes
>>> mydlda = Dlda(nf = 2)                     # initialize dlda class
>>> mydlda.compute(xtr, ytr)        # compute dlda
1
>>> mydlda.predict(xtr)             # predict dlda model on training data
array([ 1, -1,  1, -1])
>>> xts = array([4.0, 5.0, 6.0, 7.0])   # test point
>>> mydlda.predict(xts)                 # predict dlda model on test point
-1
>>> mydlda.realpred                     # real-valued prediction
-21.999999999999954
>>> mydlda.weights(xtr, ytr)        # compute weights on training data
array([  2.13162821e-14,   0.00000000e+00,   0.00000000e+00,   4.00000000e+00])

Initialize Dlda class.

Parameters :
nf : int (1 <= nf >= #features)

the number of the best features that you want to use in the model. If nf = 0 the system stops at a number of features corresponding to a peak of accuracy

tol : int

in case of nf = 0 it’s the number of steps of classification to be calculated after the peak to avoid a local maximum

overview : bool

set True to print informations about the accuracy of the classifier at every step of the compute

bal : bool

set True if it’s reasonable to consider the unbalancement of the test set similar to the one of the training set

compute(x, y, mf=0)

Compute Dlda model.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

mf : int

number of classification steps to be calculated more on a model already computed

Returns :

1

Raises :
LinAlgError

if x is singular matrix

predict(p)

Predict Dlda model on test point(s).

Parameters :
p : 1d or 2d ndarray float (sample(s) x feats)

test sample(s)

Returns :
cl : integer or 1d numpy array integer

class(es) predicted

Attributes :
self.realpred : float or 1d numpy array float

real valued prediction

weights(x, y)

Return feature weights.

Parameters :
x : 2d ndarray float (samples x feats)

training data

y : 1d ndarray integer (-1 or 1)

classes

Returns :
fw : 1d ndarray float

feature weights, they are going to be > 0 for the features chosen for the classification and = 0 for all the others

[Vapnik95]V Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
[Cristianini]N Cristianini and J Shawe-Taylor. An introduction to support vector machines. Cambridge University Press.
[Merler06]S Merler and G Jurman. Terminated Ramp - Support Vector Machine: a nonparametric data dependent kernel. Neural Network, 19:1597-1611, 2006.
[Nasr09]
  1. Nasr, S. Swamidass, and P. Baldi. Large scale study of multiplemolecule queries. Journal of Cheminformatics, vol. 1, no. 1, p. 7, 2009.
[Mika01]S Mika and A Smola and B Scholkopf. An improved training algorithm for kernel fisher discriminants. Proceedings AISTATS 2001, 2001.
[Cristianini02]N Cristianini, J Shawe-Taylor and A Elisseeff. On Kernel-Target Alignment. Advances in Neural Information Processing Systems, Volume 14, 2002.
[Cai08]D Cai, X He, J Han. SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis. Knowledge and Data Engineering, IEEE Transactions on Volume 20, Issue 1, Jan. 2008 Page(s):1 - 12.
[Ghosh03]D Ghosh. Penalized discriminant methods for the classification of tumors from gene expression data. Biometrics on Volume 59, Dec. 2003 Page(s):992 - 1000(9).