使用 Python 从头开始实现逻辑回归
原文:https://www . geeksforgeeks . org/实现-逻辑回归-从头开始-使用-python/
导言:
逻辑回归是一种有监督的学习算法,在目标变量是分类的时候使用。线性回归的假设函数 h(x)预测无界值。但是在逻辑回归的情况下,目标变量是分类的,我们必须严格控制预测值的范围。考虑一个分类问题,我们需要对电子邮件是否是垃圾邮件进行分类。因此,这里不能使用线性回归的假设函数来预测,因为它预测的是自由值,但我们必须预测 0 或 1。
为此,我们将 sigmoid 激活函数应用于线性回归的假设函数。逻辑回归的假设函数如下:
h( x ) = sigmoid( wx + b )
Here, w is the weight vector.
x is the feature vector.
b is the bias.
sigmoid( z ) = 1 / ( 1 + e( - z ) )
数学直觉:
线性回归的成本函数(或均方误差)不能用于逻辑回归,因为它是权重的非凸函数。像梯度下降这样的优化算法只能将凸函数收敛到全局最小值。
所以,我们使用的简化成本函数:
J = - ylog( h(x) ) - ( 1 - y )log( 1 - h(x) )
here, y is the real target value
h( x ) = sigmoid( wx + b )
For y = 0,
J = - log( 1 - h(x) )
and y = 1,
J = - log( h(x) )
这个代价函数是因为我们训练的时候,需要通过最小化损失函数来最大化概率。
梯度下降计算:
repeat until convergence {
tmpi = wi - alpha * dwi
wi = tmpi
}
where alpha is the learning rate.
链式法则用于计算梯度,例如 dw。
dw 的链规则
here, a = sigmoid( z ) and z = wx + b.
实施:
本次实施使用的糖尿病数据集可从链接下载。
它有 8 个特征栏,如“年龄”、“葡萄糖”心电图,以及 108 名患者的目标变量“结果”。因此,在本文中,我们将训练一个逻辑回归分类器模型来预测有这种信息的患者是否存在糖尿病。
# Importing libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings( "ignore" )
# to compare our model's accuracy with sklearn model
from sklearn.linear_model import LogisticRegression
# Logistic Regression
class LogitRegression() :
def __init__( self, learning_rate, iterations ) :
self.learning_rate = learning_rate
self.iterations = iterations
# Function for model training
def fit( self, X, Y ) :
# no_of_training_examples, no_of_features
self.m, self.n = X.shape
# weight initialization
self.W = np.zeros( self.n )
self.b = 0
self.X = X
self.Y = Y
# gradient descent learning
for i in range( self.iterations ) :
self.update_weights()
return self
# Helper function to update weights in gradient descent
def update_weights( self ) :
A = 1 / ( 1 + np.exp( - ( self.X.dot( self.W ) + self.b ) ) )
# calculate gradients
tmp = ( A - self.Y.T )
tmp = np.reshape( tmp, self.m )
dW = np.dot( self.X.T, tmp ) / self.m
db = np.sum( tmp ) / self.m
# update weights
self.W = self.W - self.learning_rate * dW
self.b = self.b - self.learning_rate * db
return self
# Hypothetical function h( x )
def predict( self, X ) :
Z = 1 / ( 1 + np.exp( - ( X.dot( self.W ) + self.b ) ) )
Y = np.where( Z > 0.5, 1, 0 )
return Y
# Driver code
def main() :
# Importing dataset
df = pd.read_csv( "diabetes.csv" )
X = df.iloc[:,:-1].values
Y = df.iloc[:,-1:].values
# Splitting dataset into train and test set
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size = 1/3, random_state = 0 )
# Model training
model = LogitRegression( learning_rate = 0.01, iterations = 1000 )
model.fit( X_train, Y_train )
model1 = LogisticRegression()
model1.fit( X_train, Y_train)
# Prediction on test set
Y_pred = model.predict( X_test )
Y_pred1 = model1.predict( X_test )
# measure performance
correctly_classified = 0
correctly_classified1 = 0
# counter
count = 0
for count in range( np.size( Y_pred ) ) :
if Y_test[count] == Y_pred[count] :
correctly_classified = correctly_classified + 1
if Y_test[count] == Y_pred1[count] :
correctly_classified1 = correctly_classified1 + 1
count = count + 1
print( "Accuracy on test set by our model : ", (
correctly_classified / count ) * 100 )
print( "Accuracy on test set by sklearn model : ", (
correctly_classified1 / count ) * 100 )
if __name__ == "__main__" :
main()
输出:
Accuracy on test set by our model : 58.333333333333336
Accuracy on test set by sklearn model : 61.111111111111114
注:以上训练的模型是为了实现数学直觉,而不仅仅是为了提高精度。
版权属于:月萌API www.moonapi.com,转载请注明出处