Pipelines & model selection¶
GAFeatureSelector is a fully compliant scikit-learn estimator (it passes
check_estimator), so it composes with the rest of the ecosystem.
Inside a Pipeline¶
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from evo_gafs import GAConfig, GAFeatureSelector
pipe = Pipeline([
("scaler", StandardScaler()),
("selector", GAFeatureSelector(
estimator=DecisionTreeClassifier(random_state=42),
config=GAConfig(population_size=20, n_generations=15, verbose=False),
)),
("clf", SVC()),
])
pipe.fit(X_train, y_train)
pipe.score(X_test, y_test)
The selector exposes get_support() so downstream tooling that inspects
selected features keeps working.
Tuning with GridSearchCV¶
Because get_params/set_params and clone are implemented correctly, you can
search over configurations. Pass alternative GAConfig
objects as the grid values:
from sklearn.model_selection import GridSearchCV
param_grid = {
"selector__config": [
GAConfig(alpha=0.9, population_size=12, n_generations=6, verbose=False),
GAConfig(alpha=0.6, population_size=12, n_generations=6, verbose=False),
]
}
search = GridSearchCV(pipe, param_grid, cv=3, scoring="accuracy")
search.fit(X, y)
print(search.best_params_["selector__config"].alpha)
Note
Wrapper selection is computationally heavy: a grid search multiplies the cost by
the number of candidates and outer folds. Keep population_size, n_generations
and cv_folds modest while tuning, and increase them for the final fit.