R语言特征选择
ibrary(Boruta)
library(mice)
library(missForest)
library(caret)
library(randomForest)
Boruta法
boruta算法运行的步骤:
1.首先,它通过创建混合副本的所有特征(即阴影特征)为给定的数据集增加了随机性。
2.然后,它训练一个随机森林分类的扩展数据集,并采用一个特征重要性措施(默认设定为平均减少精度),以评估的每个特征的重要性,越高则意味着越重要。
3.在每次迭代中,它检查一个真实特征是否比最好的阴影特征具有更高的重要性(即该特征是否比最大的阴影特征得分更高)并且不断删除它视为非常不重要的特征。
4.最后,当所有特征得到确认或拒绝,或算法达到随机森林运行的一个规定的限制时,算法停止。
traindata
<-
read.csv("/home/xuelfiang/PycharmProjects/titanic/titanic.csv",
header
=
T, stringsAsFactors
=
F,na.strings
=
T)
str(traindata)
str(traindata)
##
'data.frame': 891 obs. of 12 variables:
## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : chr "male" "female" "female" "female" ...
## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Cabin : chr "" "C85" "" "C123" ...
## $ Embarked : chr "S" "C" "S" "S" ...
## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : chr "male" "female" "female" "female" ...
## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Cabin : chr "" "C85" "" "C123" ...
## $ Embarked : chr "S" "C" "S" "S" ...
summary(traindata)
##
PassengerId Survived Pclass Name
## Min. : 1.0 Min. :0.0000 Min. :1.000 Length:891
## 1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000 Class :character
## Median :446.0 Median :0.0000 Median :3.000 Mode :character
## Mean :446.0 Mean :0.3838 Mean :2.309
## 3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000
## Max. :891.0 Max. :1.0000 Max. :3.000
##
## Sex Age SibSp Parch
## Length:891 Min. : 0.42 Min. :0.000 Min. :0.0000
## Class :character 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000
## Mode :character Median :28.00 Median :0.000 Median :0.0000
## Mean :29.70 Mean :0.523 Mean :0.3816
## 3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000
## Max. :80.00 Max. :8.000 Max. :6.0000
## NA's :177
## Ticket Fare Cabin Embarked
## Length:891 Min. : 0.00 Length:891 Length:891
## Class :character 1st Qu.: 7.91 Class :character Class :character
## Mode :character Median : 14.45 Mode :character Mode :character
## Mean : 32.20
## 3rd Qu.: 31.00
## Max. :512.33
##
## Min. : 1.0 Min. :0.0000 Min. :1.000 Length:891
## 1st Qu.:223.5 1st Qu.:0.0000 1st Qu.:2.000 Class :character
## Median :446.0 Median :0.0000 Median :3.000 Mode :character
## Mean :446.0 Mean :0.3838 Mean :2.309
## 3rd Qu.:668.5 3rd Qu.:1.0000 3rd Qu.:3.000
## Max. :891.0 Max. :1.0000 Max. :3.000
##
## Sex Age SibSp Parch
## Length:891 Min. : 0.42 Min. :0.000 Min. :0.0000
## Class :character 1st Qu.:20.12 1st Qu.:0.000 1st Qu.:0.0000
## Mode :character Median :28.00 Median :0.000 Median :0.0000
## Mean :29.70 Mean :0.523 Mean :0.3816
## 3rd Qu.:38.00 3rd Qu.:1.000 3rd Qu.:0.0000
## Max. :80.00 Max. :8.000 Max. :6.0000
## NA's :177
## Ticket Fare Cabin Embarked
## Length:891 Min. : 0.00 Length:891 Length:891
## Class :character 1st Qu.: 7.91 Class :character Class :character
## Mode :character Median : 14.45 Mode :character Mode :character
## Mean : 32.20
## 3rd Qu.: 31.00
## Max. :512.33
##
missForest法进行缺失值插补,变量为因子和数字
traindata$Sex
<-
factor(traindata$Sex)
traindata$Embarked <- factor(traindata$Embarked)
traintest <- missForest(traindata[,-c(4,9,11)])$ximp
traindata$Embarked <- factor(traindata$Embarked)
traintest <- missForest(traindata[,-c(4,9,11)])$ximp
##
missForest iteration 1 in progress...done!
## missForest iteration 2 in progress...done!
## missForest iteration 3 in progress...done!
## missForest iteration 2 in progress...done!
## missForest iteration 3 in progress...done!
实施和检查Boruta包的性能
boruta_train
<-
Boruta(traintest$Survived~.-PassengerId,data
=
traintest)
#7个变量被确认
print(boruta_train)
#7个变量被确认
print(boruta_train)
##
Boruta performed 10 iterations in 4.891767 secs.
## 7 attributes confirmed important: Age, Embarked, Fare, Parch,
## Pclass and 2 more;
## No attributes deemed unimportant.
## 7 attributes confirmed important: Age, Embarked, Fare, Parch,
## Pclass and 2 more;
## No attributes deemed unimportant.
#图表展示Boruta变量的重要性
plot(boruta_train, xlab = "", xaxt = "n")
lz<-lapply(1:ncol(boruta_train$ImpHistory),function(i) boruta_train$ImpHistory[is.finite(boruta_train$ImpHistory[,i]),i])
names(lz) <- colnames(boruta_train$ImpHistory)
Labels <- sort(sapply(lz,median))
axis(side = 1,las=2,labels = names(Labels),
at = 1:ncol(boruta_train$ImpHistory), cex.axis = 0.7)
plot(boruta_train, xlab = "", xaxt = "n")
lz<-lapply(1:ncol(boruta_train$ImpHistory),function(i) boruta_train$ImpHistory[is.finite(boruta_train$ImpHistory[,i]),i])
names(lz) <- colnames(boruta_train$ImpHistory)
Labels <- sort(sapply(lz,median))
axis(side = 1,las=2,labels = names(Labels),
at = 1:ncol(boruta_train$ImpHistory), cex.axis = 0.7)
对实验性属性进行判定。实验性属性将通过比较属性的Z分数中位数和最佳阴影属性的Z分数中位数被归类为确认或拒绝
final_boruta
<-
TentativeRoughFix(boruta_train)
##
Warning in TentativeRoughFix(boruta_train): There are no Tentative
## attributes! Returning original object.
## attributes! Returning original object.
print(final_boruta)
##
Boruta performed 10 iterations in 4.891767 secs.
## 7 attributes confirmed important: Age, Embarked, Fare, Parch,
## Pclass and 2 more;
## No attributes deemed unimportant.
## 7 attributes confirmed important: Age, Embarked, Fare, Parch,
## Pclass and 2 more;
## No attributes deemed unimportant.
getSelectedAttributes(final_boruta,
withTentative
=
F)
##
[1] "Pclass" "Sex" "Age"
"SibSp" "Parch" "Fare"
## [7] "Embarked"
## [7] "Embarked"
boruta_df
<-
attStats(final_boruta)
print(boruta_df)
print(boruta_df)
##
meanImp medianImp minImp maxImp normHits decision
## Pclass 33.91001 34.06550 32.33488 35.51682 1 Confirmed
## Sex 76.18385 76.74179 73.12429 78.15511 1 Confirmed
## Age 30.85620 30.93894 27.39081 33.15044 1 Confirmed
## SibSp 18.52499 18.54277 16.96003 20.98996 1 Confirmed
## Parch 12.46269 12.59703 11.08369 14.48570 1 Confirmed
## Fare 30.27967 30.28331 28.69235 31.52378 1 Confirmed
## Embarked 12.58664 11.97207 10.65647 16.09177 1 Confirmed
## Pclass 33.91001 34.06550 32.33488 35.51682 1 Confirmed
## Sex 76.18385 76.74179 73.12429 78.15511 1 Confirmed
## Age 30.85620 30.93894 27.39081 33.15044 1 Confirmed
## SibSp 18.52499 18.54277 16.96003 20.98996 1 Confirmed
## Parch 12.46269 12.59703 11.08369 14.48570 1 Confirmed
## Fare 30.27967 30.28331 28.69235 31.52378 1 Confirmed
## Embarked 12.58664 11.97207 10.65647 16.09177 1 Confirmed
传统的特征选择算法,caret包
control
<-
rfeControl(functions=rfFuncs,
method="cv",
number=10)
rfe_train <- rfe(traintest[,3:9], traintest[,2], sizes=1:12, rfeControl=control)
rfe_train <- rfe(traintest[,3:9], traintest[,2], sizes=1:12, rfeControl=control)
##
Warning in randomForest.default(x, y, importance = (first | last),
...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
## Warning in randomForest.default(x, y, importance = (first | last), ...):
## The response has five or fewer unique values. Are you sure you want to do
## regression?
rfe_train
##
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold)
##
## Resampling performance over subset size:
##
## Variables RMSE Rsquared MAE RMSESD RsquaredSD MAESD Selected
## 1 0.4086 0.2986 0.3342 0.02685 0.09779 0.02150
## 2 0.3922 0.3559 0.3243 0.02485 0.09674 0.02183
## 3 0.3719 0.4297 0.3084 0.02650 0.11046 0.02196
## 4 0.3627 0.4612 0.2998 0.02685 0.11105 0.02197
## 5 0.3635 0.4686 0.3069 0.02395 0.10605 0.01948
## 6 0.3452 0.4911 0.2488 0.02927 0.10422 0.02462 *
## 7 0.3471 0.4862 0.2549 0.02846 0.10176 0.02225
##
## The top 5 variables (out of 6):
## Sex, Pclass, Age, Fare, SibSp
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold)
##
## Resampling performance over subset size:
##
## Variables RMSE Rsquared MAE RMSESD RsquaredSD MAESD Selected
## 1 0.4086 0.2986 0.3342 0.02685 0.09779 0.02150
## 2 0.3922 0.3559 0.3243 0.02485 0.09674 0.02183
## 3 0.3719 0.4297 0.3084 0.02650 0.11046 0.02196
## 4 0.3627 0.4612 0.2998 0.02685 0.11105 0.02197
## 5 0.3635 0.4686 0.3069 0.02395 0.10605 0.01948
## 6 0.3452 0.4911 0.2488 0.02927 0.10422 0.02462 *
## 7 0.3471 0.4862 0.2549 0.02846 0.10176 0.02225
##
## The top 5 variables (out of 6):
## Sex, Pclass, Age, Fare, SibSp
plot(rfe_train,
type=c("g",
"o"),
cex
=
1.0,
col
=
1:11)
predictors(rfe_train)
##
[1] "Sex" "Pclass" "Age"
"Fare" "SibSp" "Embarked"
总结
相比传统的特征选择算法,Boruta能够返回变量重要性的更好结果。
评论
发表评论