Pages

Stochastic Gradient Descent : Python

# Stochastic Gradient Descent method
# dev is for evaluation
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
import numpy as np

# function to generate predictors and responses
def get_data():
    """
    Make a sample classification dataset Returns : Independent variable y, dependent variable x
    """
    no_features = 30
    redundant_features = int(0.1*no_features)
    informative_features = int(0.6*no_features)
    repeated_features = int(0.1*no_features)
    x,y = make_classification(n_samples=1000,n_features=no_features,flip_y=0.03,n_informative = informative_features, n_redundant = redundant_features ,n_repeated = repeated_features,random_state=7)
    return x,y


# function that will help to build and validate model
def build_model(x,y,x_dev,y_dev):
    estimator = SGDClassifier(shuffle=True,loss="log",learning_rate = "constant",eta0=0.0001,fit_intercept=True,penalty="none")
    estimator.fit(x,y)
    train_predcited = estimator.predict(x)
    train_score = accuracy_score(y,train_predcited)
    dev_predicted = estimator.predict(x_dev)
    dev_score = accuracy_score(y_dev,dev_predicted)

    print("Training Accuracy = %0.2f Dev Accuracy = %0.2f"% (train_score,dev_score))

# final function to invoke preceeding functions
if __name__ == "__main__":
    x,y = get_data()
    # Divide the data into Train, dev and test
    x_train,x_test_all,y_train,y_test_all = train_test_split(x,y,test_size= 0.3,random_state=9)
    x_dev,x_test,y_dev,y_test =  train_test_split(x_test_all,y_test_all,test_size=0.3,random_state=9)
    build_model(x_train,y_train,x_dev,y_dev)


# How model works
def get_data():
    """
    Make a sample classification dataset Returns : Independent variable y, dependent variable x
    """
    no_features = 30
    redundant_features = int(0.1*no_features)
    informative_features = int(0.6*no_features)
    repeated_features = int(0.1*no_features)
    x,y = make_classification(n_samples=500,n_features=no_features,flip_y=0.03,n_informative = informative_features, n_redundant = redundant_features ,n_repeated = repeated_features,random_state=7)
    return x,y

The first parameter is the number of instances required. In this case, we need 500 instances. The second parameter is about how many attributes per instance are required. We say that we
need 30. The third parameter, flip_y, randomly interchanges 3 percent of the instances.
This is done to introduce noise in our data. The next parameter is about how many out of
those 30 features should be informative enough to be used in our classification. We
specified that 60 percent of our features, that is, 18 out of 30, should be informative. The
next parameter is about redundant features. These are generated as a linear combination of
the informative features in order to introduce correlation among the features. Finally, the
repeated features are duplicate features that are drawn randomly from both the informative
and redundant features.

# split data into training and test sets
# Divide the data into Train, dev and test
x_train,x_test_all,y_train,y_test_all = train_test_split(x,y,test_size= 0.3,random_state=9)
x_dev,x_test,y_dev,y_test = train_test_split(x_test_all,y_test_all,test_size=0.3,random_state=9)

# make model
build_model(x_train,y_train,x_dev,y_dev)

Training Accuracy = 0.83 Dev Accuracy = 0.81

# In build_model, use scikit-learn’s SGDClassifier class to build stochastic gradient
# descent method

estimator = SGDClassifier(shuffle=True,loss="log", learning_rate = "constant",eta0=0.0001,fit_intercept=True,penalty="none")

The first parameter is shuffle,as in perceptron, after going through all the records once, we need to shuffle our input records when we start the next iteration. The shuffle parameter is used for the  same. The default value of shuffle is true, we have included it here for explanation purposes. Our loss function is log loss: we want to do a logistic regression and we will specify this using the loss parameter. Our learning rate, eta, is a constant that we will specify with the learning_rate parameter. We will provide the value for our learning rate using the eta0 parameter. We will then proceed to say that we need to fit the intercept, as we have not centered our data by its mean. Finally, the penalty parameter controls the type of shrinkage required. In our case, we will say that we don’t need any shrinkage using the none string.

# fit the model and evaluation of model with training and dev dataset:
 estimator.fit(x,y)
train_predcited = estimator.predict(x)
train_score = accuracy_score(y,train_predcited)
dev_predicted = estimator.predict(x_dev)
dev_score = accuracy_score(y_dev,dev_predicted)
print("Training Accuracy = %0.2f Dev Accuracy = %0.2f"%(train_score,dev_score))

Training Accuracy = 0.84 Dev Accuracy = 0.83

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED