kaggle(4) Regression with an Abalone Dataset 鲍鱼数据集的回归

kaggle(4) Regression with an Abalone Dataset 鲍鱼数据集的回归

在这里插入图片描述

import pandas as pd
import numpy as npimport xgboost
import lightgbm
import optuna
import catboostfrom sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_log_error
from sklearn.compose import TransformedTargetRegressor
from sklearn.ensemble import VotingRegressor, StackingRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoderimport seaborn as sns
import matplotlib.pyplot as pltimport warnings
warnings.filterwarnings("ignore")

PROJECT DESCRIPTION

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope

PHYSICAL ATTRIBUTES

SEX: Male/Female/Infant
LENGTH: Longest shell measurement
DIAMETER: Diameter of the Abalone
HEIGHT: Height of the Abalone
WHOLE WEIGHT: Weight of the whole abalone
SHUCKED WEIGHT: Weight of the meat
VISCERA WEIGHT: Gut Weight - Interal Organs
SHELL WEIGHT: Shell Weight after drying
RINGS: Number of rings +1.5 gives Age of the Abalone

项目介绍

通过物理测量预测鲍鱼的年龄。 鲍鱼的年龄是通过将鲍鱼壳从锥体上切开、染色并通过显微镜计算环数来确定的。

物理属性

性别:男/女/婴儿
长度:最长外壳测量值
直径:鲍鱼的直径
高度:鲍鱼的高度
整体重量:整个鲍鱼的重量
去壳重量:肉的重量
内脏重量:肠道重量 - 内脏器官
壳重:干燥后的壳重
:环数+1.5给出鲍鱼的年龄

Load the Datasets 加载数据集

# original = pd.read_csv("/kaggle/input/abalone-dataset/abalone.csv")
# train = pd.read_csv("/kaggle/input/playground-series-s4e4/train.csv")
# test = pd.read_csv("/kaggle/input/playground-series-s4e4/test.csv")
original = pd.read_csv("./data/abalone.csv")
train = pd.read_csv("./data/train.csv")
test = pd.read_csv("./data/test.csv")

Make the data ready for tuning 准备好数据进行调整

train = train.drop("id", axis=1)
train=train.rename(columns={'Whole weight':'Whole weight','Whole weight.1':'Shucked weight', 'Whole weight.2':'Viscera weight', 'Shell weight':'Shell weight'})
test=test.rename(columns={'Whole weight':'Whole weight','Whole weight.1':'Shucked weight', 'Whole weight.2':'Viscera weight', 'Shell weight':'Shell weight'})
train = pd.concat([train, original], axis=0)

Get familier with the Data 熟悉数据

train.head()
SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
0F0.5500.4300.1500.77150.32850.14650.240011
1F0.6300.4900.1451.13000.45800.27650.320011
2I0.1600.1100.0250.02100.00550.00300.00506
3M0.5950.4750.1500.91450.37550.20550.250010
4I0.5550.4250.1300.78200.36950.16000.19759
print(f"The shape of training dataset is : {train.shape}")
print(f"The shape of testing dataset is : {test.shape}")
The shape of training dataset is : (94792, 9)
The shape of testing dataset is : (60411, 9)
test.head()
idSexLengthDiameterHeightWhole weightShucked weightViscera weightShell weight
090615M0.6450.4750.1551.23800.61850.31250.3005
190616M0.5800.4600.1600.98300.47850.21950.2750
290617M0.5600.4200.1400.83950.35250.18450.2405
390618M0.5700.4900.1450.87400.35250.18650.2350
490619I0.4150.3250.1100.35800.15750.06700.1050
train.groupby("Sex").count()["Length"]
Sex
F    27802
I    34435
M    32555
Name: Length, dtype: int64
test.groupby("Sex").count()["Length"]
Sex
F    17387
I    22241
M    20783
Name: Length, dtype: int64
np.sort(pd.unique(train.Rings))
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29], dtype=int64)

View the Distribution 查看分布

train.hist(figsize=(12, 10), grid=True, bins=50)
plt.tight_layout()
plt.axis("off")
(0.0, 1.0, 0.0, 1.0)

在这里插入图片描述

test.hist(figsize=(12, 10), grid=True, bins=50)
plt.tight_layout()
plt.axis("off")
(0.0, 1.0, 0.0, 1.0)

在这里插入图片描述

CONTINUOUS COLUMN ANALYSIS 连续柱分析

# Set up warnings to be ignored (optional)
warnings.filterwarnings("ignore")
pd.set_option('mode.use_inf_as_na', False)train_str = train
train_str['Rings'] = train_str['Rings'].astype(str)# List of continuous variables in your dataset
continuous_vars = ['Length', 'Diameter', 'Height', 'Whole weight', 'Shucked weight', 'Viscera weight', 'Shell weight']# Set hue to your target column
target_column = 'Rings'for column in continuous_vars:fig, axes = plt.subplots(1, 2, figsize=(18, 4))  # Create subplots with 1 row and 2 columns# Plot histogram with hue and explicit labelssns.histplot(data=train_str, x=column, hue=target_column, bins=50, kde=True, ax=axes[0], palette='muted', legend=False)axes[0].set_title(f'Histogram of {column} with {target_column} Hue')axes[0].set_xlabel(column)axes[0].set_ylabel('Count')axes[0].legend(title=target_column, loc='upper right')# Plot KDE plot with hue and explicit labelssns.kdeplot(data=train_str, x=column, hue=target_column, ax=axes[1], palette='muted', legend=False)axes[1].set_title(f'KDE Plot of {column} with {target_column} Hue')axes[1].set_xlabel(column)axes[1].set_ylabel('Density')axes[1].legend(title=target_column, loc='upper right')plt.tight_layout()  # Adjust spacing between subplotsplt.show()

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

ANALYSIS BY QQ PLOT QQ图分析

import scipy.stats as stats  
def qq_plot_with_skewness(data, quantitative_var):# Check if the variable is present in the DataFrameif quantitative_var not in data.columns:print(f"Error: '{quantitative_var}' not found in the DataFrame.")returnf, ax = plt.subplots(1, 2, figsize=(18, 5.5))# Check for missing valuesif data[quantitative_var].isnull().any():print(f"Warning: '{quantitative_var}' contains missing values. Results may be affected.")# QQ plotstats.probplot(data[quantitative_var], plot=ax[0], fit=True)ax[0].set_title(f'QQ Plot for {quantitative_var}')# Skewness plotsns.histplot(data[quantitative_var], kde=True, ax=ax[1])ax[1].set_title(f'Distribution of {quantitative_var}')# Calculate skewness valueskewness_value = stats.skew(data[quantitative_var])# Display skewness value on the plotax[1].text(0.5, 0.5, f'Skewness: {skewness_value:.2f}', transform=ax[1].transAxes, horizontalalignment='center', verticalalignment='center', fontsize=16, color='red')plt.show()
# Example usage for each continuous variable
for var in continuous_vars:qq_plot_with_skewness(train, var)

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

Split the Dataset 分割数据集

sex_to_num = {"M": 0,"F": 1,"I": 2
}
train["Sex"] = train["Sex"].replace(sex_to_num.keys(), [sex_to_num[key] for key in sex_to_num])
test["Sex"] = test["Sex"].replace(sex_to_num.keys(), [sex_to_num[key] for key in sex_to_num])
train.groupby("Sex").count()["Length"]
Sex
0    32555
1    27802
2    34435
Name: Length, dtype: int64
X = train.drop("Rings", axis=1)
y = train.Rings
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
X_valid, X_test, y_valid, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42, stratify=y_test)

Here stratify parameter keeps the ratio of Rings same all across the Dtaset

XGBoost

we will be implementing two XGBoost models

def xgb_objective(trial):params = {"eta": trial.suggest_float("eta", 0.01, 1.0),"gamma": 0.0,"max_depth": trial.suggest_int("max_depth", 3, 20),"min_child_weight": trial.suggest_float("min_child_weight", 1., 50.),"subsample": trial.suggest_float("subsample", 0.5, 1.0),"colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),"reg_lambda": trial.suggest_float("lambda", 1.0, 100.0),"n_estimators": trial.suggest_int("n_estimators", 100, 1000)}xgb_reg = TransformedTargetRegressor(xgboost.XGBRegressor(**params, objective='reg:squarederror', grow_policy='lossguide',tree_method="hist", random_state=42),func=np.log1p,inverse_func=np.expm1)xgb_reg.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=False)val_scores = mean_squared_log_error(y_valid, xgb_reg.predict(X_valid), squared=False)return val_scoressampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization
xgb_study = optuna.create_study(direction = 'minimize',study_name="XgbRegressor", sampler=sampler)
[I 2024-04-25 16:46:29,229] A new study created in memory with name: XgbRegressor

XGBoost 1st model

TUNE = False
if TUNE:xgb_study.optimize(xgb_objective, n_trials=500)

Set TUNE parameter to True incase you want to run Hyper Parameter Tuning

xgb_best_params_1 = {'eta': 0.1006321838798394,'max_depth': 6,'min_child_weight': 27.999752791085136,'subsample': 0.7344797943645852,'colsample_bytree': 0.5389765810810496,'lambda': 79.62358968148187,'n_estimators': 407
}
xgb_reg_1 = TransformedTargetRegressor(xgboost.XGBRegressor(**xgb_best_params_1, objective='reg:squarederror', grow_policy='lossguide',tree_method="hist", random_state=42, gamma=0.0),func=np.log1p,inverse_func=np.expm1)
xgb_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.5389765810810496,early_stopping_rounds=None,enable_categorical=False,eta=0.1006321838798394,eval_metric=None,feature_types=None, gamma=0.0,gpu_id...grow_policy=&#x27;lossguide&#x27;,importance_type=None,interaction_constraints=None,lambda=79.62358968148187,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=27.999752791085136,missing=nan,monotone_constraints=None,n_estimators=407, n_jobs=None,num_parallel_tree=None, ...))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" ><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.5389765810810496,early_stopping_rounds=None,enable_categorical=False,eta=0.1006321838798394,eval_metric=None,feature_types=None, gamma=0.0,gpu_id...grow_policy=&#x27;lossguide&#x27;,importance_type=None,interaction_constraints=None,lambda=79.62358968148187,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=27.999752791085136,missing=nan,monotone_constraints=None,n_estimators=407, n_jobs=None,num_parallel_tree=None, ...))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-2" type="checkbox" ><label for="sk-estimator-id-2" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.5389765810810496, early_stopping_rounds=None,enable_categorical=False, eta=0.1006321838798394, eval_metric=None,feature_types=None, gamma=0.0, gpu_id=None,grow_policy=&#x27;lossguide&#x27;, importance_type=None,interaction_constraints=None, lambda=79.62358968148187,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=27.999752791085136, missing=nan,monotone_constraints=None, n_estimators=407, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-3" type="checkbox" ><label for="sk-estimator-id-3" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.5389765810810496, early_stopping_rounds=None,enable_categorical=False, eta=0.1006321838798394, eval_metric=None,feature_types=None, gamma=0.0, gpu_id=None,grow_policy=&#x27;lossguide&#x27;, importance_type=None,interaction_constraints=None, lambda=79.62358968148187,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=27.999752791085136, missing=nan,monotone_constraints=None, n_estimators=407, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, xgb_reg_1.predict(X_valid), squared=False)
0.1484868328972631
feature_importance = xgb_reg_1.regressor_.feature_importances_
feature_names = X_train.columnssorted_indices = feature_importance.argsort()
sorted_importance = feature_importance[sorted_indices]
sorted_features = feature_names[sorted_indices]plt.figure(figsize=(10, 6))
colors = plt.cm.tab20c.colors[:len(sorted_features)]  
plt.barh(sorted_features, sorted_importance, color=colors)
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('XGBoost Feature Importance')
plt.gca().invert_yaxis() 
plt.tight_layout()  
plt.show()

在这里插入图片描述

XGBoost 2

xgb_best_params_2 = {'eta': 0.08999645298052271,'max_depth': 6,'min_child_weight': 2.088127882610971,'subsample': 0.7725806961689413,'colsample_bytree': 0.9163306027660207,'lambda': 5.356530752285997,'n_estimators': 652
}
xgb_reg_2 = TransformedTargetRegressor(xgboost.XGBRegressor(**xgb_best_params_2, objective='reg:squaredlogerror', grow_policy='depthwise',tree_method="hist", random_state=42),func=np.log1p,inverse_func=np.expm1)
xgb_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.9163306027660207,early_stopping_rounds=None,enable_categorical=False,eta=0.08999645298052271,eval_metric=None,feature_types=None,gamma=None, gpu_...grow_policy=&#x27;depthwise&#x27;,importance_type=None,interaction_constraints=None,lambda=5.356530752285997,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=2.088127882610971,missing=nan,monotone_constraints=None,n_estimators=652, n_jobs=None,num_parallel_tree=None, ...))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-4" type="checkbox" ><label for="sk-estimator-id-4" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=XGBRegressor(base_score=None, booster=None,callbacks=None,colsample_bylevel=None,colsample_bynode=None,colsample_bytree=0.9163306027660207,early_stopping_rounds=None,enable_categorical=False,eta=0.08999645298052271,eval_metric=None,feature_types=None,gamma=None, gpu_...grow_policy=&#x27;depthwise&#x27;,importance_type=None,interaction_constraints=None,lambda=5.356530752285997,learning_rate=None,max_bin=None,max_cat_threshold=None,max_cat_to_onehot=None,max_delta_step=None,max_depth=6, max_leaves=None,min_child_weight=2.088127882610971,missing=nan,monotone_constraints=None,n_estimators=652, n_jobs=None,num_parallel_tree=None, ...))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-5" type="checkbox" ><label for="sk-estimator-id-5" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.9163306027660207, early_stopping_rounds=None,enable_categorical=False, eta=0.08999645298052271,eval_metric=None, feature_types=None, gamma=None, gpu_id=None,grow_policy=&#x27;depthwise&#x27;, importance_type=None,interaction_constraints=None, lambda=5.356530752285997,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=2.088127882610971, missing=nan,monotone_constraints=None, n_estimators=652, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-6" type="checkbox" ><label for="sk-estimator-id-6" class="sk-toggleable__label sk-toggleable__label-arrow">XGBRegressor</label><div class="sk-toggleable__content"><pre>XGBRegressor(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=0.9163306027660207, early_stopping_rounds=None,enable_categorical=False, eta=0.08999645298052271,eval_metric=None, feature_types=None, gamma=None, gpu_id=None,grow_policy=&#x27;depthwise&#x27;, importance_type=None,interaction_constraints=None, lambda=5.356530752285997,learning_rate=None, max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None, max_depth=6,max_leaves=None, min_child_weight=2.088127882610971, missing=nan,monotone_constraints=None, n_estimators=652, n_jobs=None,num_parallel_tree=None, ...)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, xgb_reg_2.predict(X_valid), squared=False)
0.14881444008796907

LIGHTGBM

def lgbm_objective(trial):# Define parameters to be optimized for the LGBMClassifierparam = {"verbosity": -1,"random_state": 42,"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.05),"n_estimators": trial.suggest_int("n_estimators", 400, 1000),"lambda_l1": trial.suggest_float("lambda_l1", 0.005, 0.015),"lambda_l2": trial.suggest_float("lambda_l2", 0.02, 0.06),"max_depth": trial.suggest_int("max_depth", 6, 14),"colsample_bytree": trial.suggest_float("colsample_bytree", 0.3, 0.9),"subsample": trial.suggest_float("subsample", 0.8, 1.0),"min_child_samples": trial.suggest_int("min_child_samples", 10, 70),"num_leaves": trial.suggest_int("num_leaves", 30, 100),"min_split_gain": trial.suggest_float("min_split_gain", 0.1, 1.0)}lgbm_reg = lightgbm.LGBMRegressor(**param)lgbm_reg.fit(X_train, y_train)score = mean_squared_log_error(y_valid, lgbm_reg.predict(X_valid), squared=False)return score# Set up the sampler for Optuna optimization
sampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization# Create a study object for Optuna optimization
lgbm_study = optuna.create_study(direction="minimize", sampler=sampler)
[I 2024-04-25 16:46:32,141] A new study created in memory with name: no-name-889f25f6-876c-4982-ba46-f71528b83793
if TUNE:# Run the optimization processlgbm_study.optimize(lambda trial: lgbm_objective(trial), n_trials=200)# Get the best parameters after optimizationlgbm_best_params = lgbm_study.best_paramsprint('='*50)print(lgbm_best_params)

LIGHTGbm 1

lgbm_params_1 = {'learning_rate': 0.04090453688322824,'n_estimators': 788,'reg_lambda': 29.248167932522765,'reg_alpha': 0.4583079398945705,'max_depth': 19,'colsample_bytree': 0.5439642175304692,'subsample': 0.8659762900446526,'min_child_samples': 12,'num_leaves': 69,'random_state': 42,'n_jobs': -1,'verbose': -1
}
lgbm_reg_1 = TransformedTargetRegressor(lightgbm.LGBMRegressor(**lgbm_params_1),func=np.log1p,inverse_func=np.expm1)
lgbm_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824,max_depth=19,min_child_samples=12,n_estimators=788, n_jobs=-1,num_leaves=69,random_state=42,reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765,subsample=0.8659762900446526,verbose=-1))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824,max_depth=19,min_child_samples=12,n_estimators=788, n_jobs=-1,num_leaves=69,random_state=42,reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765,subsample=0.8659762900446526,verbose=-1))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-8" type="checkbox" ><label for="sk-estimator-id-8" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824, max_depth=19,min_child_samples=12, n_estimators=788, n_jobs=-1, num_leaves=69,random_state=42, reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765, subsample=0.8659762900446526,verbose=-1)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-9" type="checkbox" ><label for="sk-estimator-id-9" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5439642175304692,learning_rate=0.04090453688322824, max_depth=19,min_child_samples=12, n_estimators=788, n_jobs=-1, num_leaves=69,random_state=42, reg_alpha=0.4583079398945705,reg_lambda=29.248167932522765, subsample=0.8659762900446526,verbose=-1)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, lgbm_reg_1.predict(X_valid), squared=False)
0.1477381305885499
feature_importance = lgbm_reg_1.regressor_.feature_importances_feature_names = X_train.columnssorted_indices = feature_importance.argsort()
sorted_importance = feature_importance[sorted_indices]
sorted_features = feature_names[sorted_indices]# Plot feature importance
plt.figure(figsize=(12, 8))
colors = plt.cm.Paired.colors[:len(sorted_features)]  
plt.barh(sorted_features, sorted_importance, color=colors)
plt.xlabel('Importance', fontsize=12)
plt.ylabel('Feature', fontsize=12)
plt.title('LightGBM Feature Importance', fontsize=14)
plt.gca().invert_yaxis() for i, v in enumerate(sorted_importance):plt.text(v + 0.02, i, f'{v:.2f}', color='black', va='center', fontsize=10)plt.tight_layout()  
plt.show()

在这里插入图片描述

LIGHTGbm 2

lgbm_params_2 = {'n_jobs': -1,'verbose': -1,'max_depth': 20,'num_leaves': 165,'subsample_freq': 1,'random_state': 42,'n_estimators': 1460,'min_child_samples': 25,'reg_lambda': 6.13475387151606,'subsample': 0.8036874216939632,'reg_alpha': 0.3152990674231573,'learning_rate': 0.009336479469693189,'colsample_bytree': 0.5780931837049811,'min_child_weight': 0.37333232256934057,
}
lgbm_reg_2 = TransformedTargetRegressor(lightgbm.LGBMRegressor(**lgbm_params_2),func=np.log1p,inverse_func=np.expm1)
lgbm_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189,max_depth=20,min_child_samples=25,min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1,num_leaves=165,random_state=42,reg_alpha=0.3152990674231573,reg_lambda=6.13475387151606,subsample=0.8036874216939632,subsample_freq=1,verbose=-1))</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-10" type="checkbox" ><label for="sk-estimator-id-10" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189,max_depth=20,min_child_samples=25,min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1,num_leaves=165,random_state=42,reg_alpha=0.3152990674231573,reg_lambda=6.13475387151606,subsample=0.8036874216939632,subsample_freq=1,verbose=-1))</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-11" type="checkbox" ><label for="sk-estimator-id-11" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189, max_depth=20,min_child_samples=25, min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1, num_leaves=165, random_state=42,reg_alpha=0.3152990674231573, reg_lambda=6.13475387151606,subsample=0.8036874216939632, subsample_freq=1, verbose=-1)</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-12" type="checkbox" ><label for="sk-estimator-id-12" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMRegressor</label><div class="sk-toggleable__content"><pre>LGBMRegressor(colsample_bytree=0.5780931837049811,learning_rate=0.009336479469693189, max_depth=20,min_child_samples=25, min_child_weight=0.37333232256934057,n_estimators=1460, n_jobs=-1, num_leaves=165, random_state=42,reg_alpha=0.3152990674231573, reg_lambda=6.13475387151606,subsample=0.8036874216939632, subsample_freq=1, verbose=-1)</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, lgbm_reg_2.predict(X_valid), squared=False)
0.14758741259851116

CatBoost

def cb_objective(trial):params = {"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.5),"max_depth": trial.suggest_int("depth", 4, 16),"l2_leaf_reg": trial.suggest_float("l2_leaf_reg", 1, 10),"n_estimators": trial.suggest_int("n_estimators", 100, 1500),"colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 1.0),}cb_reg = TransformedTargetRegressor(catboost.CatBoostRegressor(**params, random_state=42, grow_policy='SymmetricTree',random_strength=0, cat_features=["Sex"], loss_function="RMSE"),func=np.log1p,inverse_func=np.expm1)cb_reg.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=False)val_scores = np.sqrt(mean_squared_log_error(y_valid, np.abs(cb_reg.predict(X_valid))))return val_scoressampler = optuna.samplers.TPESampler(seed=42)  # Using Tree-structured Parzen Estimator sampler for optimization
cb_study = optuna.create_study(direction = 'minimize',study_name="CBRegressor", sampler=sampler)
[I 2024-04-25 16:46:44,532] A new study created in memory with name: CBRegressor
if TUNE:cb_study.optimize(cb_objective, 30)

CatBoost 1

cb_params_1 = {'grow_policy': 'SymmetricTree', 'n_estimators': 1000, 'learning_rate': 0.128912681527133, 'l2_leaf_reg': 1.836927907521674, 'max_depth': 6, 'colsample_bylevel': 0.6775373040510968, 'random_strength': 0, 'boost_from_average': True, 'loss_function': 'RMSE', 'cat_features': ['Sex'], 'verbose': False}
cat_reg_1 = TransformedTargetRegressor(catboost.CatBoostRegressor(**cb_params_1),func=np.log1p,inverse_func=np.expm1)
cat_reg_1.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-13" type="checkbox" ><label for="sk-estimator-id-13" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-14" type="checkbox" ><label for="sk-estimator-id-14" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-15" type="checkbox" ><label for="sk-estimator-id-15" class="sk-toggleable__label sk-toggleable__label-arrow">CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D08EB7310&gt;</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, cat_reg_1.predict(X_valid), squared=False)
0.14824841372252795

CatBoost 2

cb_params_2 = {'depth': 15, 'verbose': 0,'max_bin': 464, 'verbose': False,'random_state':42,'task_type': 'CPU', 'random_state': 42,'min_data_in_leaf': 78, 'loss_function': 'RMSE', 'grow_policy': 'Lossguide', 'bootstrap_type': 'Bernoulli', 'subsample': 0.83862137638162, 'l2_leaf_reg': 8.365422739510098, 'random_strength': 3.296124856352495, 'learning_rate': 0.09992185242598203,
}
cat_reg_2 = TransformedTargetRegressor(catboost.CatBoostRegressor(**cb_params_2),func=np.log1p,inverse_func=np.expm1)
cat_reg_2.fit(X_train, y_train)
TransformedTargetRegressor(func=<ufunc 'log1p'>, inverse_func=<ufunc 'expm1'>,
                       regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-16" type="checkbox" ><label for="sk-estimator-id-16" class="sk-toggleable__label sk-toggleable__label-arrow">TransformedTargetRegressor</label><div class="sk-toggleable__content"><pre>TransformedTargetRegressor(func=&lt;ufunc &#x27;log1p&#x27;&gt;, inverse_func=&lt;ufunc &#x27;expm1&#x27;&gt;,regressor=&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;)</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-17" type="checkbox" ><label for="sk-estimator-id-17" class="sk-toggleable__label sk-toggleable__label-arrow">regressor: CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-18" type="checkbox" ><label for="sk-estimator-id-18" class="sk-toggleable__label sk-toggleable__label-arrow">CatBoostRegressor</label><div class="sk-toggleable__content"><pre>&lt;catboost.core.CatBoostRegressor object at 0x0000021D088EB4C0&gt;</pre></div></div></div></div></div></div></div></div></div></div>
mean_squared_log_error(y_valid, cat_reg_2.predict(X_valid), squared=False)
0.14774083919007364

Ensembling the Results Using VotingRegressor 使用 VotingRegressor 组合结果

# weights = [0.025, 0.025, 0.275, 0.275, 0.05, 0.35]ensemble = VotingRegressor([
#         ("xgb_1", xgb_reg_1),
#         ("xgb_2", xgb_reg_2),("lgbm_1", lgbm_reg_1),("lgbm_2", lgbm_reg_2),("cb_1", cat_reg_1),("cb_2", cat_reg_2)]
)
ensemble.fit(X, y)

在这里插入图片描述

Submit the Output

pred = ensemble.predict(test.drop("id", axis=1))
submission = pd.DataFrame(test.id)
submission["Rings"] = pred
submission.to_csv("submission.csv", index=False)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/317763.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

STM32利用硬件I2C读取MPU6050陀螺仪数据

有了前面的基本配置&#xff0c;这节读取MPU6050的数据还算是简单&#xff0c;主要就是初始化时给MPU6050一些配置&#xff0c;取消睡眠模式&#xff0c;MPU6050开机是默认睡眠模式的&#xff0c;读写无效&#xff0c;所以上来就要先更改配置&#xff1a; MPU6050寄存器初始化…

LLM优化:开源星火13B显卡及内存占用优化

1. 背景 本qiang~这两天接了一个任务&#xff0c;部署几个开源的模型&#xff0c;并且将本地经过全量微调的模型与开源模型做一个效果对比。 部署的开源模型包括&#xff1a;星火13B&#xff0c;Baichuan2-13B, ChatGLM6B等 其他两个模型基于transformers架构封装&#xff0…

java-stream流案例

需求 代码 Vote类 // 1. 定义一个投票类 public class Vote {private String name;private ArrayList<String> voteList;public Vote(String name, ArrayList<String> voteList) {this.name name;this.voteList voteList;}public String getName() {return nam…

微信小程序:5.数据绑定

在Data中定义数据早wxml中进行数据使用 在data中定义数据 在页面对应的js对象中找到data&#xff0c;然后把数据进行定义即可 Page({data: {motto: Hello World,userInfo: {avatarUrl: defaultAvatarUrl,nickName: ,},hasUserInfo: false,canIUseGetUserProfile: wx.canIUse…

Ollamallama

Olllama 直接下载ollama程序&#xff0c;安装后可在cmd里直接运行大模型&#xff1b; llama 3 meta 开源的最新llama大模型&#xff1b; 下载运行 1 ollama ollama run llama3 2 github 下载仓库&#xff0c;需要linux环境&#xff0c;windows可使用wsl&#xff1b; 接…

linux 光驱(光盘)安装

文章目录 选择光盘自带 YUM 库创建 repo创建文件夹挂载光驱开机自启动挂载安装软件YUM 安装RPM 安装源码包安装 选择光盘 vmware 选择光盘 自带 YUM 库 ls /etc/yum.repos.d创建 repo vim /etc/yum.repo.d/demo.repo // 编写 repo 相关配置 [demo] namedemo baseurlfile://…

Java进阶-File、递归、IO流

目录 前言 File类概述 File类创建对象 绝对路径和相对路径 File类常用API 判断文件类型、获取文件信息 创建文件、删除文件功能 遍历文件夹 方法递归 递归的形式和特点 递归的算法流程、核心要素 常见案例 经典问题-猴子吃桃问题 非规律化递归案例-文件搜索 非规…

【网络原理】UDP协议 | UDP报文格式 | 校验和 | UDP的特点 | 应用层的自定义格式

文章目录 一、UDP协议1.UDP的传输流程发送方接收方 2.UDP协议报文格式&#xff1a;长度受限校验和如何校验&#xff1a;CRC算法&#xff1a;循环冗余算法md5算法&#xff1a; 2.UDP的特点 二、开发中常见的自定义格式1.xml&#xff08;古老&#xff09;2.json&#xff08;最流行…

java案例-读取xml文件

需求 导入依赖 <dependencies><!-- dom4j --><dependency><groupId>dom4j</groupId><artifactId>dom4j</artifactId><version>1.6.1</version></dependency> </dependencies>代码 SAXReader saxReade…

Ansible自动化

Ansible自动化 自动化的需求&#xff1a; 1. 在什么样的场景下需要自动化&#xff1f; 批量化的工作&#xff1a; 装软件包、配置服务、升级、下发文件… 2. 为什么在自动化工具中选择ansible&#xff1f; 对比shell脚本&#xff1a; 相对于用shell的脚本来实现自动化&#x…

跨平台桌面客户端开发框架

跨平台桌面客户端开发框架允许开发者创建能够在多个操作系统上运行的桌面应用程序。以下是一些流行的跨平台桌面客户端开发框架。这些框架各有优势&#xff0c;选择哪个框架取决于项目需求、团队的技术栈以及对特定特性的偏好。 1.Electron &#xff1a; 使用JavaScript, HTML…

uniapp视频播放器(h5+app)

关于uniapp视频播放器遇到的一些问题&#xff0c;mark下。 中途遇到了很多问题&#xff0c;如果有相同的伙伴遇到了类似的&#xff0c;欢迎交流 官方的video播放器在app上不友好&#xff0c;有以下功能不支持。 loadedmetadata、controlstoggle不支持导致只能手写控制层。 不…

python学习之词云图片生成

代码实现 import jieba import wordcloudf open("D:/Pythonstudy/data/平凡的世界.txt", "r", encoding"utf-8") t f.read() print(t) f.close() ls jieba.lcut(t) txt " ".join(ls)w wordcloud.WordCloud(font_path"D:/cc…

【webrtc】MessageHandler 8: 基于线程的消息处理:处理音频输入输出断开

m98代码,看起来m114 去掉了MessageHandler :音频的录制和播放 都使用了on message,但只是用来通知并处理流的断开的。AAudioRecorder AAudioRecorder 处理流断开 OnErrorCallback :有可能 错误回调是别处来的,是其他线程, 但是这个错误的处理要再自己的线程执行: 音频播…

Docker:centos7安装docker

官网&#xff1a;https://www.docker.com/官网 文档地址 - 确认centos7及其以上的版本 查看当前系统版本 cat /etc/redhat-release- 卸载旧版本 依照官网执行 - yum安装gcc相关 yum -y install gccyum -y install gcc-c- 安装需要的软件包 yum install -y yum-utils- 设置s…

Flask简介

Flask简介 安装概述使用PyCharm创建一个Flask程序 Flask程序的基本结构初始化路由和视图函数启动服务器请求-响应循环 安装 概述 Flask算是小型框架&#xff0c;小到可以称为“微框架”。Flask 非常小&#xff0c;因此你一旦能够熟练使用它&#xff0c;很可能就能读懂它所有的…

小米笔记本文件夹里是空白怎么办?分享原因及解决方案

随着科技的不断发展&#xff0c;笔记本电脑已成为我们日常生活和工作中不可或缺的一部分。而小米&#xff0c;作为知名的科技品牌&#xff0c;其笔记本产品凭借其出色的性能和合理的价格&#xff0c;受到了广大用户的喜爱。然而&#xff0c;在使用过程中&#xff0c;有时我们可…

Android AOSP探索之Ubantu下Toolbox的安装

文章目录 概述安装Toolbox解决运行的问题 概述 由于最近需要进军android的framework,所以需要工具的支持&#xff0c;之前听说江湖上都流传source insight,我去弄了一个破解版&#xff0c;功能确实强大&#xff0c;但是作为多年android开发的我习惯使用android studio。虽然使…

Delta lake with Java--利用spark sql操作数据2

上一篇文章尝试了建库&#xff0c;建表&#xff0c;插入数据&#xff0c;还差删除和更新&#xff0c;所以在这篇文章补充一下&#xff0c;代码很简单&#xff0c;具体如下&#xff1a; import org.apache.spark.sql.SaveMode; import org.apache.spark.sql.SparkSession;publi…

网盘——分享文件——界面设计

本文主要讲解网盘中文件操作的分享文件部分&#xff0c;主要包含两方面的设计&#xff1a;分享文件界面设计和逻辑设计。 1、界面设计 1.1、添加一个类 1.2、引入头文件 #include <QPushButton> #include <QHBoxLayout> #include <QVBoxLayout> #include …