MLfromCash: 2016

2016年12月9日星期五

Python_Note21

Multithreading

處理程序 (Process)是作業系統中應用程式的執行實例，而執行緒 (Thread)是處理程序內部的執行單元。當系統建立一個處理程序後，同時也建立一個主執行緒。而每個處理程序也可以有多個執行緒。使用多執行緒可以充分利用CPU來實現平行處理 (Parallel processing)。

Python 3透過 threading 模組提供許多對執行緒程式設計的支援。

threading — Thread-based parallelism

https://docs.python.org/3/library/threading.html

建立執行緒

透過繼承 threading模組中 Thread類別來建立新類別，在新建立的類別中多載 run方法，然後透過 start方法建立執行緒。建立執行緒後將執行 run方法。

start(): Start the thread’s activity.

It must be called at most once per thread object. It arranges for the object’s run() method to be invoked in a separate thread of control.

This method will raise a RuntimeError if called more than once on the same thread object.

run(): Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

如果一個執行緒或函數在執行過程中呼叫另一個執行緒，且必須等所呼叫的執行緒操作完後才能繼續目前執行緒的執行，那麼在呼叫執行緒中可以此用被呼叫執行緒的 join方法。

join(timeout=None)¶

Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception –, or until the optional timeout occurs.

當執行緒建立後，可以用 Thread 物件的 is_alive 方法來檢視執行緒是否執行。

is_alive()

Return whether the thread is alive.

This method returns True just before the run() method starts until just after the run() method terminates. The module function enumerate() returns a list of all alive threads.

當執行緒建立後，還可以用 Thread 物件的 setName 方法來設定執行緒名稱，以便對不同執行緒進行控制。並可以用 getName 方法來獲得執行緒的名稱。

getName()

setName()

在指令稿執行過程中會有一個主執行緒，如果主執行緒又建立一個子執行緒，當主執行緒退出時，會檢驗子執行緒是否完成。如果子執行緒未完成，則主執行緒會等待子執行緒完成後再退出。如果想要主執行緒退出時，不管子執行緒是否完成都隨主執行緒推出，可以設定 Thread 物件的 daemon 屬性為 True 來達到這種效果。

daemon

A boolean value indicating whether this thread is a daemon thread (True) or not (False). This must be set before start() is called, otherwise RuntimeError is raised. Its initial value is inherited from the creating thread; the main thread is not a daemon thread and therefore all threads created in the main thread default to daemon = False.

The entire Python program exits when no alive non-daemon threads are left.

2016年11月8日星期二

Python_Note13

Python_pandas

Introduce to Data Structures

http://pandas.pydata.org/pandas-docs/stable/dsintro.html#intro-to-data-structures

Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

s = pd.Series(data, index=index)

Here, data can be many different things:

a Python dict

an ndarray

a scalar value (like 5)

In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [4]: s
Out[4]: 
a   -2.7828
b    0.4264
c   -0.6505
d    1.1465
e   -0.6631
dtype: float64

In [5]: s.index
Out[5]: Index([u'a', u'b', u'c', u'd', u'e'], dtype='object')

In [6]: pd.Series(np.random.randn(5))
Out[6]: 
0    0.2939
1   -0.4049
2    1.1665
3    0.8420
4    0.5398
dtype: float64

In[87]: s = pd.Series([1,2,3], index=['a','b','c'])

In[88]: s
Out[88]:
a 1
b 2
c 3
dtype: int64

In[85]: s = pd.Series(['a','b','c'], index=[1,2,3])
In[86]: s
Out[86]:
1 a
2 b
3 c
dtype: object

In[82]: s = pd.Series([1,2,3], index=[1,2,3])
In[83]: s
Out[83]:
1 1
2 2
3 3
dtype: int64

In [16]: s['a']
Out[16]: -2.7827595933769937

In [17]: s['e'] = 12.

In [18]: s
Out[18]: 
a    -2.7828
b     0.4264
c    -0.6505
d     1.1465
e    12.0000
dtype: float64

In [19]: 'e' in s
Out[19]: True

In [20]: 'f' in s
Out[20]: False

pandas.read_csv

Read CSV (comma-separated) file into DataFrame

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for IO Tools.

pandas.DataFrame.dropna

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Return object with labels on given axis omitted where alternately any or all of the data are missing

pandas.DataFrame.iloc

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

2016年10月31日星期一

Python_Note20

sklearn.linear_model.LogisticRegression

class sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True,intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr',verbose=0, warm_start=False, n_jobs=1)

`decision_function`(X)	Predict confidence scores for samples.
`densify`()	Convert coefficient matrix to dense array format.
`fit`(X, y[, sample_weight])	Fit the model according to the given training data.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X)	Predict class labels for samples in X.
`predict_log_proba`(X)	Log of probability estimates.
`predict_proba`(X)	Probability estimates.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.
`sparsify`()	Convert coefficient matrix to sparse format.
`transform`(args, *kwargs)	DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19.

Attributes:

Attributes:	coef_ : array, shape (n_classes, n_features) Coefficient of the features in the decision function. intercept_ : array, shape (n_classes,) Intercept (a.k.a. bias) added to the decision function. If fit_intercept is set to False, the intercept is set to zero. n_iter_ : array, shape (n_classes,) or (1, ) Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given.

coef_ : array, shape (n_classes, n_features)

Coefficient of the features in the decision function.

intercept_ : array, shape (n_classes,)

Intercept (a.k.a. bias) added to the decision function. If fit_intercept is set to False, the intercept is set to zero.

n_iter_ : array, shape (n_classes,) or (1, )

Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given.

sklearn.linear_model.Perceptron

class sklearn.linear_model.Perceptron(penalty=None, alpha=0.0001, fit_intercept=True, n_iter=5, shuffle=True,verbose=0, eta0=1.0, n_jobs=1, random_state=0, class_weight=None, warm_start=False)

`decision_function`(X)	Predict confidence scores for samples.
`densify`()	Convert coefficient matrix to dense array format.
`fit`(X, y[, coef_init, intercept_init, ...])	Fit linear model with Stochastic Gradient Descent.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`partial_fit`(X, y[, classes, sample_weight])	Fit linear model with Stochastic Gradient Descent.
`predict`(X)	Predict class labels for samples in X.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
`set_params`(args, *kwargs)
`sparsify`()	Convert coefficient matrix to sparse format.
`transform`(args, *kwargs)	DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19.

1.6. Nearest Neighbors

http://scikit-learn.org/stable/modules/neighbors.html

`neighbors.NearestNeighbors`([n_neighbors, ...])	Unsupervised learner for implementing neighbor searches.
`neighbors.KNeighborsClassifier`([...])	Classifier implementing the k-nearest neighbors vote.
`neighbors.RadiusNeighborsClassifier`([...])	Classifier implementing a vote among neighbors within a given radius
`neighbors.KNeighborsRegressor`([n_neighbors, ...])	Regression based on k-nearest neighbors.
`neighbors.RadiusNeighborsRegressor`([radius, ...])	Regression based on neighbors within a fixed radius.
`neighbors.NearestCentroid`([metric, ...])	Nearest centroid classifier.
`neighbors.BallTree`	BallTree for fast generalized N-point problems
`neighbors.KDTree`	KDTree for fast generalized N-point problems
`neighbors.LSHForest`([n_estimators, radius, ...])	Performs approximate nearest neighbor search using LSH forest.
`neighbors.DistanceMetric`	DistanceMetric class
`neighbors.KernelDensity`([bandwidth, ...])	Kernel Density Estimation

`neighbors.kneighbors_graph`(X, n_neighbors[, ...])	Computes the (weighted) graph of k-Neighbors for points in X
`neighbors.radius_neighbors_graph`(X, radius)	Computes the (weighted) graph of Neighbors for points in X

sklearn.neighbors.KNeighborsClassifier

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=1, **kwargs)

1.13. Feature selection

http://scikit-learn.org/stable/modules/feature_selection.html
The sklearn.feature_selection module implements feature selection algorithms. It currently includes univariate filter selection methods and the recursive feature elimination algorithm.

`feature_selection.GenericUnivariateSelect`([...])	Univariate feature selector with configurable strategy.
`feature_selection.SelectPercentile`([...])	Select features according to a percentile of the highest scores.
`feature_selection.SelectKBest`([score_func, k])	Select features according to the k highest scores.
`feature_selection.SelectFpr`([score_func, alpha])	Filter: Select the pvalues below alpha based on a FPR test.
`feature_selection.SelectFdr`([score_func, alpha])	Filter: Select the p-values for an estimated false discovery rate
`feature_selection.SelectFromModel`(estimator)	Meta-transformer for selecting features based on importance weights.
`feature_selection.SelectFwe`([score_func, alpha])	Filter: Select the p-values corresponding to Family-wise error rate
`feature_selection.RFE`(estimator[, ...])	Feature ranking with recursive feature elimination.
`feature_selection.RFECV`(estimator[, step, ...])	Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
`feature_selection.VarianceThreshold`([threshold])	Feature selector that removes all low-variance features.

`feature_selection.chi2`(X, y)	Compute chi-squared stats between each non-negative feature and class.
`feature_selection.f_classif`(X, y)	Compute the ANOVA F-value for the provided sample.
`feature_selection.f_regression`(X, y[, center])	Univariate linear regression tests.
`feature_selection.mutual_info_classif`(X, y)	Estimate mutual information for a discrete target variable.
`feature_selection.mutual_info_regression`(X, y)	Estimate mutual information for a continuous target variable.

3.1. Cross-validation: evaluating estimator performance

http://scikit-learn.org/stable/modules/cross_validation.html

sklearn.model_selection.train_test_split(*arrays, **options)

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

4.8.2. Label encoding

http://scikit-learn.org/stable/modules/preprocessing_targets.html#label-en coding

sklearn.preprocessing.LabelEncode

class sklearn.preprocessing.LabelEncoder

Encode labels with value between 0 and n_classes-1.

`fit`(y)	Fit label encoder
`fit_transform`(y)	Fit label encoder and return encoded labels
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(y)	Transform labels back to original encoding.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(y)	Transform labels to normalized encoding.

4.3.4. Encoding categorical features

http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-categorical-features

sklearn.preprocessing.OneHotEncoder

class sklearn.preprocessing.OneHotEncoder(n_values='auto', categorical_features='all', dtype=<type 'numpy.float64'>, sparse=True, handle_unknown='error')

Encode categorical integer features using a one-hot aka one-of-K scheme.

4.3.1.1. Scaling features to a range

http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range

sklearn.preprocessing.MinMaxScaler

class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)

Transforms features by scaling each feature to a given range.

This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one.

The transformation is given by:

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min

sklearn.base: Base classes and utility functions

Base classes for all estimators.

Base classes

`base.BaseEstimator`	Base class for all estimators in scikit-learn
`base.ClassifierMixin`	Mixin class for all classifiers in scikit-learn.
`base.ClusterMixin`	Mixin class for all cluster estimators in scikit-learn.
`base.RegressorMixin`	Mixin class for all regression estimators in scikit-learn.
`base.TransformerMixin`	Mixin class for all transformers in scikit-learn.

Functions

base.clone(estimator[, safe]) Constructs a new estimator with the same parameters.

2016年12月9日 星期五