Recursive Feature Elimination(RFE)

It is a wrapper method for feature selection. It recursively removes the least important features based on their impact on the model’s performance. The goal is to rank features by importance and select the best subset.

How it works

Train a model on the full set of features.
Rank features based on their importance
- Coefficients in linear models
- Feature importance in tree-based models.
Remove the least important feature(s).
Repeat the process until the desired number of features is reached.
The remaining features are the selected subset.

Pros of RFE

It works well when you have a predefined idea of how many features you want.

Cons of RFE

Easy to overfit if the dataset is small, as it doesn’t validate performance across different subsets.

What if we don’t know the optimal number of features that we want & we want our model to be robust, not prone to overfit

In this case, we can use RFECV

Recursive Feature Elimination with Cross-Validation (RFECV) is an enhanced versino of RFE. It extends RFE by incorporating cross-validation to automatically determine the optimal number of features, improving robustness.

How it works

Similar to RFE, it starts with all features and iteratively removes the least important ones.
At each step, it evaluates the model performance using cross-validation.
Tracks the model performance for each subset of features.
Selects the subset with the best cross-validated performance.

Pros of RFECV

Automatically selects the optimal number of features based on model performance.
Reduces overfitting by validating on multiple folds of data.
Saves time compared to manually experimenting with different feature counts.

Cons of RFECV

More computationlly expensive than RFE due to CV at each iteration.
Might need more data to ensure stable cv results.

Summary

Use RFE if you already know the exact number of features to retain, when computational resources are limited, speet is a priority or just for a quick exploratory purposes.
정확히 원하는 피처 수가 정해져있을 때, 가볍고 빠르게 훑어보고싶을 때, 오버피팅될 수 있음 (train data로만 평가하기 때문)

Ohterwise use RFECV when you don’t know the optimal number of features to select, if you want to ensure robustness and reduce overfitting.
피처 수를 몇개로 추릴지 정하지않았을 때, 가볍고 빠른 것보다 computational cost가 좀 들더라도 robust 한 결과를 원할 때. train - valid 나눠서 평가하기때문에 좀 더 정확한 결과를 얻을 수 있음.

RFE and RFECV in wrapper method

Recursive Feature Elimination(RFE)

How it works

Pros of RFE

Cons of RFE

What if we don’t know the optimal number of features that we want & we want our model to be robust, not prone to overfit

In this case, we can use RFECV

How it works

Pros of RFECV

Cons of RFECV

Summary

Trending Tags

Contents

Trending Tags

RFE and RFECV in wrapper method

Recursive Feature Elimination(RFE)

How it works

Pros of RFE

Cons of RFE

What if we don’t know the optimal number of features that we want & we want our model to be robust, not prone to overfit

In this case, we can use RFECV

How it works

Pros of RFECV

Cons of RFECV

Summary

Trending Tags

Contents

Further Reading

Installing LightGBM on an M1 Mac

What is Big-O Notation

Melt and Unmelt(Pivot) in Pandas, Python

Trending Tags