Python_pandas
Introduce to Data Structures
http://pandas.pydata.org/pandas-docs/stable/dsintro.html#intro-to-data-structuresSeries
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
Here, data can be many different things:
In[87]: s = pd.Series([1,2,3], index=['a','b','c'])
- a Python dict
- an ndarray
- a scalar value (like 5)
In[88]: s
Out[88]:
a 1
b 2
c 3
dtype: int64
In[85]: s = pd.Series(['a','b','c'], index=[1,2,3])
In[86]: s
Out[86]:
1 a
2 b
3 c
dtype: object
In[82]: s = pd.Series([1,2,3], index=[1,2,3])
In[83]: s
Out[83]:
1 1
2 2
3 3
dtype: int64
pandas.read_csv
Read CSV (comma-separated) file into DataFrame
Also supports optionally iterating or breaking of the file into chunks.
Additional help can be found in the online docs for IO Tools.
pandas.DataFrame.dropna
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Return object with labels on given axis omitted where alternately any or all of the data are missing
pandas.DataFrame.iloc
Purely integer-location based indexing for selection by position.
.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.