AI Sklearn。Python 中的 StratifiedShuffleSplit()函数

Sklearn。Python 中的 StratifiedShuffleSplit()函数

原文:https://www . geesforgeks . org/sklearn-stratifiedshufflesplit-python 中的函数/

在本文中，我们将从 sklearn 库中了解 StratifiedShuffleSplit 交叉验证器，该库提供训练测试索引以将数据拆分为训练测试集。

什么是 StratifiedShuffleSplit？

StratifiedShuffleSplit 是 ShuffleSplit 和stratifiedfold的组合。使用 StratifiedShuffleSplit 类标签的分布比例在训练和测试数据集之间几乎是均匀的。 StratifiedShuffleSplit 和stratifiedfold(shuffle = True)的主要区别在于在stratifiedfold中，数据集在开始时只被洗牌一次，然后分裂成指定数量的折叠。这就排除了列车测试集重叠的任何机会。然而，在 StratifiedShuffleSplit 中，每次在分割完成之前，数据都会被打乱，这就是为什么在训练测试集之间重叠的可能性更大的原因。

语法: sklearn.model_selection。StratifiedShuffleSplit(n _ splits = 10，* test _ size =无，train _ size =无，random _ state =无)

参数:

n_splits: int，默认值=10

重新洗牌和拆分迭代的次数。

test_size: float 或 int，默认值=None

如果为浮点型，则应介于 0.0 和 1.0 之间，并表示要包含在测试分割中的数据集的比例。

train_size: float 或 int，默认值=None

如果为浮点型，则应介于 0.0 和 1.0 之间，并表示要包含在训练分割中的数据集的比例。

随机 _ 状态: int

控制所产生的训练和测试指标的随机性。

下面是实现。

步骤 1) 导入所需模块。

Python 3

# import the libraries
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedShuffleSplit

步骤 2) 加载数据集，识别因变量和自变量。T3】

数据集可以从这里下载。

Python 3

# convert data set into dataframe
churn_df = pd.read_csv(r"ChurnData.csv")

# assign dependent and independent variables
X = churn_df[['tenure', 'age', 'address', 'income',
              'ed', 'employ', 'equip',   'callcard', 'wireless']]

y = churn_df['churn'].astype('int')

步骤 3) 预处理数据。

Python 3

# data pre-processing
X = preprocessing.StandardScaler().fit(X).transform(X)

第 4 步)创建层的对象类。

Python 3

# use StratifiedShuffleSplit()
sss = StratifiedShuffleSplit(n_splits=4, test_size=0.5,
                             random_state=0)
sss.get_n_splits(X, y)

输出:

步骤 5) 调用实例，将数据帧拆分为训练样本和测试样本。 split() 函数返回列车测试样本的指数。使用回归算法，比较每个预测值的准确性。

Python 3

scores = []

# using regression to get predicted data
rf = RandomForestClassifier(n_estimators=40, max_depth=7)
for train_index, test_index in sss.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    rf.fit(X_train, y_train)
    pred = rf.predict(X_test)
    scores.append(accuracy_score(y_test, pred))

# get accurracy of each prediction
print(scores)

输出:

版权属于：月萌API www.moonapi.com，转载请注明出处

本文链接：https://moonapi.com/news/16437.html

AI 查看更多书籍

《GeeksForGeeks 人工智能中文教程 2022-06-29》

分类

最近更新

AI Sklearn。Python 中的 StratifiedShuffleSplit()函数

Sklearn。Python 中的 StratifiedShuffleSplit()函数

什么是 StratifiedShuffleSplit？

Python 3

Python 3

Python 3

Python 3

Python 3

留言

联系客服

数据知识

系统公告

开发文档

AI查看更多书籍

《GeeksForGeeks 人工智能中文教程 2022-06-29》

AI Sklearn。Python 中的 StratifiedShuffleSplit()函数

Sklearn。Python 中的 StratifiedShuffleSplit()函数

什么是 StratifiedShuffleSplit？

Python 3

Python 3

Python 3

Python 3

Python 3

留言

联系客服

AI 查看更多书籍