Python_scikit_learn Guide
1.1. Generalized Linear Models Logistic regression
(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True,intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr',verbose=0, warm_start=False, n_jobs=1)decision_function (X) | Predict confidence scores for samples. |
densify () | Convert coefficient matrix to dense array format. |
fit (X, y[, sample_weight]) | Fit the model according to the given training data. |
fit_transform (X[, y]) | Fit to data, then transform it. |
get_params ([deep]) | Get parameters for this estimator. |
predict (X) | Predict class labels for samples in X. |
predict_log_proba (X) | Log of probability estimates. |
predict_proba (X) | Probability estimates. |
score (X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params (**params) | Set the parameters of this estimator. |
sparsify () | Convert coefficient matrix to sparse format. |
transform (*args, **kwargs) | DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19. |
Attributes: |
coef_ : array, shape (n_classes, n_features)
intercept_ : array, shape (n_classes,)
n_iter_ : array, shape (n_classes,) or (1, )
(penalty=None, alpha=0.0001, fit_intercept=True, n_iter=5, shuffle=True,verbose=0, eta0=1.0, n_jobs=1, random_state=0, class_weight=None, warm_start=False)decision_function (X) | Predict confidence scores for samples. |
densify () | Convert coefficient matrix to dense array format. |
fit (X, y[, coef_init, intercept_init, ...]) | Fit linear model with Stochastic Gradient Descent. |
fit_transform (X[, y]) | Fit to data, then transform it. |
get_params ([deep]) | Get parameters for this estimator. |
partial_fit (X, y[, classes, sample_weight]) | Fit linear model with Stochastic Gradient Descent. |
predict (X) | Predict class labels for samples in X. |
score (X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params (*args, **kwargs) | |
sparsify () | Convert coefficient matrix to sparse format. |
transform (*args, **kwargs) | DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19. |
1.6. Nearest Neighbors ([n_neighbors, ...]) | Unsupervised learner for implementing neighbor searches. |
neighbors.KNeighborsClassifier ([...]) | Classifier implementing the k-nearest neighbors vote. |
neighbors.RadiusNeighborsClassifier ([...]) | Classifier implementing a vote among neighbors within a given radius |
neighbors.KNeighborsRegressor ([n_neighbors, ...]) | Regression based on k-nearest neighbors. |
neighbors.RadiusNeighborsRegressor ([radius, ...]) | Regression based on neighbors within a fixed radius. |
neighbors.NearestCentroid ([metric, ...]) | Nearest centroid classifier. |
neighbors.BallTree | BallTree for fast generalized N-point problems |
neighbors.KDTree | KDTree for fast generalized N-point problems |
neighbors.LSHForest ([n_estimators, radius, ...]) | Performs approximate nearest neighbor search using LSH forest. |
neighbors.DistanceMetric | DistanceMetric class |
neighbors.KernelDensity ([bandwidth, ...]) | Kernel Density Estimation |
neighbors.kneighbors_graph (X, n_neighbors[, ...]) | Computes the (weighted) graph of k-Neighbors for points in X |
neighbors.radius_neighbors_graph (X, radius) | Computes the (weighted) graph of Neighbors for points in X |
(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)1.13. Feature selection
module implements feature selection algorithms. It currently includes univariate filter selection methods and the recursive feature elimination algorithm.feature_selection.GenericUnivariateSelect ([...]) | Univariate feature selector with configurable strategy. |
feature_selection.SelectPercentile ([...]) | Select features according to a percentile of the highest scores. |
feature_selection.SelectKBest ([score_func, k]) | Select features according to the k highest scores. |
feature_selection.SelectFpr ([score_func, alpha]) | Filter: Select the pvalues below alpha based on a FPR test. |
feature_selection.SelectFdr ([score_func, alpha]) | Filter: Select the p-values for an estimated false discovery rate |
feature_selection.SelectFromModel (estimator) | Meta-transformer for selecting features based on importance weights. |
feature_selection.SelectFwe ([score_func, alpha]) | Filter: Select the p-values corresponding to Family-wise error rate |
feature_selection.RFE (estimator[, ...]) | Feature ranking with recursive feature elimination. |
feature_selection.RFECV (estimator[, step, ...]) | Feature ranking with recursive feature elimination and cross-validated selection of the best number of features. |
feature_selection.VarianceThreshold ([threshold]) | Feature selector that removes all low-variance features. |
feature_selection.chi2 (X, y) | Compute chi-squared stats between each non-negative feature and class. |
feature_selection.f_classif (X, y) | Compute the ANOVA F-value for the provided sample. |
feature_selection.f_regression (X, y[, center]) | Univariate linear regression tests. |
feature_selection.mutual_info_classif (X, y) | Estimate mutual information for a discrete target variable. |
feature_selection.mutual_info_regression (X, y) | Estimate mutual information for a continuous target variable. |
3.1. Cross-validation: evaluating estimator performance
sklearn.model_selection.train_test_split(*arrays, **options)
Split arrays or matrices into random train and test subsetsQuick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.
4.8.2. Label encoding
Encode labels with value between 0 and n_classes-1.
fit (y) | Fit label encoder |
fit_transform (y) | Fit label encoder and return encoded labels |
get_params ([deep]) | Get parameters for this estimator. |
inverse_transform (y) | Transform labels back to original encoding. |
set_params (**params) | Set the parameters of this estimator. |
transform (y) | Transform labels to normalized encoding. |
4.3.4. Encoding categorical features
(n_values='auto', categorical_features='all', dtype=<type 'numpy.float64'>, sparse=True, handle_unknown='error')
Encode categorical integer features using a one-hot aka one-of-K scheme. Scaling features to a range
(feature_range=(0, 1), copy=True)
Transforms features by scaling each feature to a given range.
This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one.
The transformation is given by:
sklearn.base: Base classes and utility functions
Base classes for all estimators.Base classes
base.BaseEstimator | Base class for all estimators in scikit-learn |
base.ClassifierMixin | Mixin class for all classifiers in scikit-learn. |
base.ClusterMixin | Mixin class for all cluster estimators in scikit-learn. |
base.RegressorMixin | Mixin class for all regression estimators in scikit-learn. |
base.TransformerMixin | Mixin class for all transformers in scikit-learn. |
base.clone (estimator[, safe]) | Constructs a new estimator with the same parameters. |