Hello Mat

 找回密码
 立即注册
查看: 4733|回复: 2

xgboost

[复制链接]

84

主题

115

帖子

731

金钱

管理员

Rank: 9Rank: 9Rank: 9

积分
1467
发表于 2020-8-28 13:13:12 | 显示全部楼层 |阅读模式
xgboost:
学习目标参数(针对xgboost,不是针对sklearn xgboost)
1. objective [default: reg:squarederror(均方误差)]
    a: 目标函数的选择,默认为均方误差损失,当然还有很多其他的,这里列举几个主要的
    b: reg:squarederror       均方误差
    c: reg:logistic           对数几率损失,参考对数几率回归(逻辑回归)
    d: binary:logistic        二分类对数几率回归,输出概率值
    e: binary:hinge           二分类合页损失,此时不输出概率值,而是0或1
    f: multi:softmax          多分类softmax损失,此时需要设置num_class参数

2. eval_metric [default: 根据objective而定]
    a: 模型性能度量方法,主要根据objective而定,也可以自定义一些,下面列举一些常见的
    b: rmse : root mean square error     也就是平方误差和开根号
    c: mae  : mean absolute error        误差的绝对值再求平均
    d: auc  : area under curve           roc曲线下面积
    e: aucpr: area under the pr curve    pr曲线下面积
  1. pip install xgboost -i https://pypi.douban.com/simple --trusted-host pypi.douban.com
复制代码
  1. import pandas as pd
  2. import numpy as np
  3. from collections import Counter
  4. from sklearn.model_selection import train_test_split
  5. from sklearn.preprocessing import LabelEncoder
  6. import torch
  7. from torch.utils.data import Dataset, DataLoader
  8. import torch.optim as torch_optim
  9. import torch.nn as nn
  10. import torch.nn.functional as F
  11. from torchvision import models
  12. import xgboost as xgb
  13. from sklearn.metrics import accuracy_score  # 准确率
  14. from datetime import datetime
  15. #from dataset import ShelterOutcomeDataset, get_default_device
  16. from dataset4 import ToArray

  17. # Load Data
  18. train = pd.read_csv(r'train.csv')
  19. print("Shape:", train.shape)
  20. train.head()

  21. # Data preprocessing
  22. train_X = train.drop(columns= ['OutcomeType', 'OutcomeSubtype', 'AnimalID'])
  23. Y = train['OutcomeType']

  24. # Stacking train and test set so that they undergo the same preprocessing
  25. stacked_df = train_X
  26. stacked_df = stacked_df.drop(columns=['DateTime'])
  27. stacked_df.head()

  28. # dropping columns with too many nulls
  29. for col in stacked_df.columns:
  30.     if stacked_df[col].isnull().sum() > 10000:
  31.         print("dropping", col, stacked_df[col].isnull().sum())
  32.         stacked_df = stacked_df.drop(columns = [col])
  33. stacked_df.head()

  34. # label encoding
  35. for col in stacked_df.columns:
  36.     if stacked_df.dtypes[col] == "object":
  37.         stacked_df[col] = stacked_df[col].fillna("NA")
  38.     else:
  39.         stacked_df[col] = stacked_df[col].fillna(0)
  40.     stacked_df[col] = LabelEncoder().fit_transform(stacked_df[col])

  41. # making all variables categorical
  42. for col in stacked_df.columns:
  43.     stacked_df[col] = stacked_df[col].astype('category')

  44. # splitting back train and test
  45. X = stacked_df[0:26729]
  46. # Encoding target
  47. Y = LabelEncoder().fit_transform(Y)

  48. XY = ToArray(X, Y)
  49. X = []
  50. X = XY.x
  51. Y = []
  52. Y = XY.y

  53. #train-valid split
  54. X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.10, random_state=0)

  55. # xgboost
  56. # 算法参数
  57. params = {
  58.         'objective': 'reg:squarederror',
  59.         'max_depth': 6,
  60.         'eta': 1.0
  61.         }
  62. plst = list(params.items())

  63. dtrain = xgb.DMatrix(X_train, y_train) # 生成数据集格式
  64. num_rounds = 100
  65. model = xgb.train(plst, dtrain, num_rounds) # xgboost模型训练

  66. # 对测试集进行预测
  67. dval = xgb.DMatrix(X_val)
  68. y_pred = model.predict(dval)

  69. model.save_model('xgb.model')

  70. # 计算准确率
  71. #accuracy = accuracy_score(y_val, y_pred)
  72. #print('accuarcy:%.2f%%'%(accuracy*100))

  73. x_input = np.array([[2351.0, 1.01, 3.0, 5.0, 1221.0, 130.0]]).astype(np.float32)
  74. dtest = xgb.DMatrix(x_input)
  75. y_test_pred = model.predict(dtest)
  76. print('y_test_pred = ', y_test_pred[0])

  77. tar = xgb.Booster(model_file='xgb.model')
  78. x_test = xgb.DMatrix(x_input)
  79. pre=tar.predict(x_test)
  80. print('pre = ', pre[0])
复制代码



参考:
【1】Using XGBOOST in c++
【2】xgboost编译安装包
【3】cmake
【4】whl安装包:https://www.lfd.uci.edu/~gohlke/pythonlibs/#xgboost
【5】代码实现:https://blog.csdn.net/lamusique/article/details/96478351
【6】XGBoost调参详解:https://zhuanlan.zhihu.com/p/95304498
【7】xgboost安装问题(XGBoost Library (xgboost.dl1) could not be loaded)



回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Python|Opencv|MATLAB|Halcom.cn ( 蜀ICP备16027072号 )

GMT+8, 2024-4-20 23:39 , Processed in 0.223784 second(s), 24 queries .

Powered by Discuz! X3.4

Copyright © 2001-2021, Tencent Cloud.

快速回复 返回顶部 返回列表