Introduction to Data Visualization with Python
可视化一直觉得R包ggplot2
和其衍生包就够了,因此这部分大家觉得任务重,可以把优先级放到后面再看。
Bryan Van de Ven | DataCamp 另外这个哥们对于每个argument讲解的非常细致。
Move on!!!
Plotting multiple graphs | Python
plt.axes([x_lo,y_lo,width,hight])
不同,可以一个图展示多个图,类似于ggplot2
中的arrange(p1,p2)
函数。
x_lo
: \(min(x)\)y_lo
: \(min(y)\)width
: \(\Delta x\)hight
: \(\Delta y\)
plt.subplot(nrows,ncols,nsubplot)
表示
nrows
: 整个图有几行ncols
: 整个图有几列nsubplot
: 第几个图
Using subplot() (1) | Python
The command
plt.axes()
requires a lot of effort to use well because the coordinates of the axes need to be set manually. A better alternative is to useplt.subplot()
to determine the layout automatically.
plt.tight_layout()
表示
整个大图和小图的边缘1,调整为字的1.08倍,这个是default选择。
In [1]: help(plt.tight_layout)
Help on function tight_layout in module matplotlib.pyplot:
tight_layout(pad=1.08, h_pad=None, w_pad=None, rect=None)
Automatically adjust subplot parameters to give specified padding.
Parameters:
pad : float
padding between the figure edge and the edges of subplots, as a fraction of the font-size.
h_pad, w_pad : float
padding (height/width) between edges of adjacent subplots.
Defaults to `pad_inches`.
rect : if rect is given, it is interpreted as a rectangle
(left, bottom, right, top) in the normalized figure
coordinate that the whole subplots area (including
labels) will fit into. Default is (0, 0, 1, 1).
Customizing axes | Python
xlim
和ylim
可以设置最大值和最小值。
plt.savefig('')
和ggsave
很像。
Using axis() | Python
plt.axis([x_min,x_max,y_min,y_max])
= plt.xlim(x_min,x_max)
+ `plt.xlim(y_min,y_max)
changes limits of x or y axis so that equal increments of x and y have the same length; a circle is circular.
axis(equal)
表示\(\Delta x\)和\(\Delta y\)等比例,也就是说原来画个椭圆,现在就是圆了。
In [2]: help(plt.axis)
Help on function axis in module matplotlib.pyplot:
axis(*v, **kwargs)
Convenience method to get or set axis properties.
Calling with no arguments::
>>> axis()
returns the current axes limits ``[xmin, xmax, ymin, ymax]``.::
>>> axis(v)
sets the min and max of the x and y axes, with
``v = [xmin, xmax, ymin, ymax]``.::
>>> axis('off')
turns off the axis lines and labels.::
>>> axis('equal')
changes limits of *x* or *y* axis so that equal increments of *x*
and *y* have the same length; a circle is circular.::
>>> axis('scaled')
achieves the same result by changing the dimensions of the plot box instead
of the axis data limits.::
>>> axis('tight')
changes *x* and *y* axis limits such that all data is shown. If
all data is already shown, it will move it to the center of the
figure without modifying (*xmax* - *xmin*) or (*ymax* -
*ymin*). Note this is slightly different than in MATLAB.::
>>> axis('image')
is 'scaled' with the axis limits equal to the data limits.::
>>> axis('auto')
and::
>>> axis('normal')
are deprecated. They restore default behavior; axis limits are automatically
scaled to make the data fit comfortably within the plot box.
if ``len(*v)==0``, you can pass in *xmin*, *xmax*, *ymin*, *ymax*
as kwargs selectively to alter just those limits without changing
the others.
>>> axis('square')
changes the limit ranges (*xmax*-*xmin*) and (*ymax*-*ymin*) of
the *x* and *y* axes to be the same, and have the same scaling,
resulting in a square plot.
The xmin, xmax, ymin, ymax tuple is returned
.. seealso::
:func:`xlim`, :func:`ylim`
For setting the x- and y-limits individually.
Using legend() | Python
plt.plot
特定了label=
,在后续的plt.legend()
中会按照特定的文字在图中标注出来。
# Specify the label 'Computer Science'
plt.plot(year, computer_science, color='red', label='Computer Science')
# Specify the label 'Physical Sciences'
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')
# Add a legend at the lower center
plt.legend(loc='lower center')
# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')
plt.show()
Using annotate() | Python
假设,
\[y=f(t)\]
\(\max f(t)\)代表y
是f(t)
函式所有的值中最大的output
。
\(argmaxf(t)\)代表f(t)
函式中,产生\(\max f(t)\)对应的\(t\)。
max和argmax的区别_考研数学笔记_新浪博客
the arguments of the maxima (abbreviated arg max or argmax).
.argmax()
是numpy
包的,并且这个函数在梯度下降等地方用的很多。
In [5]: import numpy as np
In [6]: help(np.argmax)
Help on function argmax in module numpy.core.fromnumeric:
argmax(a, axis=None, out=None)
Returns the indices of the maximum values along an axis.
Parameters
----------
a : array_like
Input array.
axis : int, optional
By default, the index is into the flattened array, otherwise
along the specified axis.
out : array, optional
If provided, the result will be inserted into this array. It should
be of the appropriate shape and dtype.
Returns
-------
index_array : ndarray of ints
Array of indices into the array. It has the same shape as `a.shape`
with the dimension along `axis` removed.
See Also
--------
ndarray.argmax, argmin
amax : The maximum value along a given axis.
unravel_index : Convert a flat index into an index tuple.
Notes
-----
In case of multiple occurrences of the maximum values, the indices
corresponding to the first occurrence are returned.
Examples
--------
>>> a = np.arange(6).reshape(2,3)
>>> a
array([[0, 1, 2],
[3, 4, 5]])
>>> np.argmax(a)
5
>>> np.argmax(a, axis=0)
array([1, 1, 1])
>>> np.argmax(a, axis=1)
array([2, 2])
>>> b = np.arange(6)
>>> b[1] = 5
>>> b
array([0, 5, 2, 3, 4, 5])
>>> np.argmax(b) # Only the first occurrence is returned.
1
# Plot with legend as before
plt.plot(year, computer_science, color='red', label='Computer Science')
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')
plt.legend(loc='lower right')
# Compute the maximum enrollment of women in Computer Science: cs_max
cs_max = computer_science.max()
# Calculate the year in which there was maximum enrollment of women in Computer Science: yr_max
yr_max = year[computer_science.argmax()]
# Add a black arrow annotation
plt.annotate('Maximum',
xy = (yr_max, cs_max),
xytext = (yr_max+5, cs_max+5),
arrowprops=dict(facecolor='black'))
# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')
plt.show()
The single letter shortcut for 'black'
is 'k'
.
Modifying styles | Python
plt.style.use('ggplot')
这个就和R中ggplot2
的风格很像了。
Working with 2D arrays | Python
Numpy
2主要是用来计算矩阵的。
这里对slice
函数的用法解释得非常清楚。
Slicing:
- 1D arrays: A[slice],
- 2D arrays: A[slice0, slice1]
Slicing:
slice = start:stop:stride
- Indexes from start to stop-1 in steps of stride
- Missing start: implicitly at beginning of array
- Missing stop: implicitly at end of array
- Missing stride: implicitly stride 1
Negative indexes/slices: count from end of array
Generating meshes | Python
In [6]: help(np.linspace)
Help on function linspace in module numpy.core.function_base:
linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
Return evenly spaced numbers over a specified interval.
Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].
The endpoint of the interval can optionally be excluded.
Parameters
----------
start : scalar
The starting value of the sequence.
stop : scalar
The end value of the sequence, unless `endpoint` is set to False.
In that case, the sequence consists of all but the last of ``num + 1``
evenly spaced samples, so that `stop` is excluded. Note that the step
size changes when `endpoint` is False.
num : int, optional
Number of samples to generate. Default is 50. Must be non-negative.
endpoint : bool, optional
If True, `stop` is the last sample. Otherwise, it is not included.
Default is True.
retstep : bool, optional
If True, return (`samples`, `step`), where `step` is the spacing
between samples.
dtype : dtype, optional
The type of the output array. If `dtype` is not given, infer the data
type from the other input arguments.
.. versionadded:: 1.9.0
Returns
-------
samples : ndarray
There are `num` equally spaced samples in the closed interval
``[start, stop]`` or the half-open interval ``[start, stop)``
(depending on whether `endpoint` is True or False).
step : float, optional
Only returned if `retstep` is True
Size of spacing between samples.
See Also
--------
arange : Similar to `linspace`, but uses a step size (instead of the
number of samples).
logspace : Samples uniformly distributed in log space.
Examples
--------
>>> np.linspace(2.0, 3.0, num=5)
array([ 2. , 2.25, 2.5 , 2.75, 3. ])
>>> np.linspace(2.0, 3.0, num=5, endpoint=False)
array([ 2. , 2.2, 2.4, 2.6, 2.8])
>>> np.linspace(2.0, 3.0, num=5, retstep=True)
(array([ 2. , 2.25, 2.5 , 2.75, 3. ]), 0.25)
Graphical illustration:
>>> import matplotlib.pyplot as plt
>>> N = 8
>>> y = np.zeros(N)
>>> x1 = np.linspace(0, 10, N, endpoint=True)
>>> x2 = np.linspace(0, 10, N, endpoint=False)
>>> plt.plot(x1, y, 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.plot(x2, y + 0.5, 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.ylim([-0.5, 1])
(-0.5, 1)
>>> plt.show()
np.linspace(数列开始值, 数列结束值, 样本大小)
meshgrid是MATLAB中用于生成网格采样点的函数。在使用MATLAB进行3-D图形绘制方面有着广泛的应用。
这里应该是numpy的借鉴。
In [8]: help(np.meshgrid)
Help on function meshgrid in module numpy.lib.function_base:
meshgrid(*xi, **kwargs)
Return coordinate matrices from coordinate vectors.
Make N-D coordinate arrays for vectorized evaluations of
N-D scalar/vector fields over N-D grids, given
one-dimensional coordinate arrays x1, x2,..., xn.
.. versionchanged:: 1.9
1-D and 0-D cases are allowed.
Parameters
----------
x1, x2,..., xn : array_like
1-D arrays representing the coordinates of a grid.
indexing : {'xy', 'ij'}, optional
Cartesian ('xy', default) or matrix ('ij') indexing of output.
See Notes for more details.
.. versionadded:: 1.7.0
sparse : bool, optional
If True a sparse grid is returned in order to conserve memory.
Default is False.
.. versionadded:: 1.7.0
copy : bool, optional
If False, a view into the original arrays are returned in order to
conserve memory. Default is True. Please note that
``sparse=False, copy=False`` will likely return non-contiguous
arrays. Furthermore, more than one element of a broadcast array
may refer to a single memory location. If you need to write to the
arrays, make copies first.
.. versionadded:: 1.7.0
Returns
-------
X1, X2,..., XN : ndarray
For vectors `x1`, `x2`,..., 'xn' with lengths ``Ni=len(xi)`` ,
return ``(N1, N2, N3,...Nn)`` shaped arrays if indexing='ij'
or ``(N2, N1, N3,...Nn)`` shaped arrays if indexing='xy'
with the elements of `xi` repeated to fill the matrix along
the first dimension for `x1`, the second for `x2` and so on.
Notes
-----
This function supports both indexing conventions through the indexing
keyword argument. Giving the string 'ij' returns a meshgrid with
matrix indexing, while 'xy' returns a meshgrid with Cartesian indexing.
In the 2-D case with inputs of length M and N, the outputs are of shape
(N, M) for 'xy' indexing and (M, N) for 'ij' indexing. In the 3-D case
with inputs of length M, N and P, outputs are of shape (N, M, P) for
'xy' indexing and (M, N, P) for 'ij' indexing. The difference is
illustrated by the following code snippet::
xv, yv = np.meshgrid(x, y, sparse=False, indexing='ij')
for i in range(nx):
for j in range(ny):
# treat xv[i,j], yv[i,j]
xv, yv = np.meshgrid(x, y, sparse=False, indexing='xy')
for i in range(nx):
for j in range(ny):
# treat xv[j,i], yv[j,i]
In the 1-D and 0-D case, the indexing and sparse keywords have no effect.
See Also
--------
index_tricks.mgrid : Construct a multi-dimensional "meshgrid"
using indexing notation.
index_tricks.ogrid : Construct an open multi-dimensional "meshgrid"
using indexing notation.
Examples
--------
>>> nx, ny = (3, 2)
>>> x = np.linspace(0, 1, nx)
>>> y = np.linspace(0, 1, ny)
>>> xv, yv = np.meshgrid(x, y)
>>> xv
array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]])
>>> yv
array([[ 0., 0., 0.],
[ 1., 1., 1.]])
>>> xv, yv = np.meshgrid(x, y, sparse=True) # make sparse output arrays
>>> xv
array([[ 0. , 0.5, 1. ]])
>>> yv
array([[ 0.],
[ 1.]])
`meshgrid` is very useful to evaluate functions on a grid.
>>> x = np.arange(-5, 5, 0.1)
>>> y = np.arange(-5, 5, 0.1)
>>> xx, yy = np.meshgrid(x, y, sparse=True)
>>> z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
>>> h = plt.contourf(x,y,z)
对MATLAB中meshgrid的理解 - CSDN博客
要在3<=x<=5,6<=y<=9,z不限制区间 这个区域内绘制一个3-D图形。
(3,9),(4,9),(5,9);
(3,8),(4,8),(5,8);
(3,7),(4,7),(5,7);
(3,6),(4,6),(5,6);
把各个点的x坐标独立出来,得:
3,4,5;
3,4,5;
3,4,5;
3,4,5;
再把各个点的y坐标也独立出来:
9,9,9;
8,8,8;
7,7,7;
6,6,6;
因此可以发现,
x
代表了行向量\([3,4,5]\),
y
代表了列向量\(\begin{bmatrix}3 \\ 4 \\5\end{bmatrix}\)。
这刚好对应了np.linspace
3的假设。
# Generate two 1-D arrays: u, v
u = np.linspace(-2, +2, 41)
v = np.linspace(-1,+1,21)
# Generate 2-D arrays from u and v: X, Y
X,Y = np.meshgrid(u,v)
# Compute Z based on X and Y
Z = np.sin(3*np.sqrt(X**2 + Y**2))
相应地,这里的u
和v
分别是行向量和列向量,
X
,Y
是矩阵。
X
内每一行相等,行数是v
的长度;
Y
内每一列相等,行数是u
的长度。
In [16]: X.shape
Out[16]: (21, 41)
In [17]: Y.shape
Out[17]: (21, 41)
已知X**2
表示\(X_{m \times n} \times X_{n \times m}^T = (X \times X^T)_{m \times n}\)。
所以,X**2 + Y**2
也是(21, 41)
。
所以,np.sqrt(X**2 + Y**2).shape
也是(21, 41)
。
The sine is one of the fundamental functions of trigonometry (the mathematical study of triangles).
np.sin
就是正弦函数。
In [25]: np.sin(np.pi/2)
Out[25]: 1.0
In [26]: np.sin(np.array((0., 30., 45., 60., 90.)) * np.pi / 180. )
Out[26]: array([ 0. , 0.5 , 0.70710678, 0.8660254 , 1. ])
# Import numpy and matplotlib.pyplot
import numpy as np
import matplotlib.pyplot as plt
# Generate two 1-D arrays: u, v
u = np.linspace(-2, +2, 41)
v = np.linspace(-1,+1,21)
# Generate 2-D arrays from u and v: X, Y
X,Y = np.meshgrid(u,v)
# Compute Z based on X and Y
Z = np.sin(3*np.sqrt(X**2 + Y**2))
# Display the resulting image with pcolor()
plt.pcolor(Z)
plt.show()
# Save the figure to 'sine_mesh.png'
plt.savefig('sine_mesh.png')
图非常的厉害,大家可以感受一下。
Array orientation | Python
产生的矩阵是
In [6]: np.array([[1, 2, 1], [0, 0, 1], [-1, 1, 1]])
Out[6]:
array([[ 1, 0, -1],
[ 2, 0, 1],
[ 1, 1, 1]])
但是在途中对应的矩阵表达的是
\[\begin{bmatrix} 1 & 1 & 1 \\ 2 & 0 & 1 \\ 1 & 0 & -1 \\ \end{bmatrix}\]
显然是做了一个倒序排列,也就是说第一个行向量\([ 1, 0, -1]\)想排在x轴上,后来的在这个基础上累加。
这本身也符合我们的构图逻辑。
例如一个矩阵
\[\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{bmatrix}\]
我们在构图的时候,其实从原点\(xOy(0,0)\)开始出发,因此这样的效果,就是热力图上面的效果。
Visualizing bivariate functions | Python
pseudo-color 假色? 假色是指在一幅影像中使用与全彩不同的颜色描述一项物体。
因此可以表达平面图的第三维度。
plt.contour()
是画轮廓线,等高线那种,是plt.pcolor()
的变形。
plt.contour(Z, 30)
中,Z
包含三个维度的值,30
表示等高线的数量。
Contour & filled contour plots | Python
# Generate a default contour map of the array Z
plt.subplot(2,2,1)
plt.contour(Z)
# Generate a contour map with 20 contours
plt.subplot(2,2,2)
plt.contour(Z,20)
# Generate a default filled contour map of the array Z
plt.subplot(2,2,3)
plt.contourf(Z)
# Generate a default filled contour map with 20 contours
plt.subplot(2,2,4)
plt.contourf(Z,20)
# Improve the spacing between subplots
plt.tight_layout()
# Display the figure
plt.show()
plt.contourf
显然就是加了fill
的plt.contour
。
Modifying colormaps | Python
# Create a filled contour plot with a color map of 'viridis'
plt.subplot(2,2,1)
plt.contourf(X,Y,Z,20, cmap='viridis')
plt.colorbar()
plt.title('Viridis')
# Create a filled contour plot with a color map of 'gray'
plt.subplot(2,2,2)
plt.contourf(X,Y,Z,20, cmap='gray')
plt.colorbar()
plt.title('Gray')
# Create a filled contour plot with a color map of 'autumn'
plt.subplot(2,2,3)
plt.contourf(X,Y,Z,20, cmap='autumn')
plt.colorbar()
plt.title('Autumn')
# Create a filled contour plot with a color map of 'winter'
plt.subplot(2,2,4)
plt.contourf(X,Y,Z,20, cmap='winter')
plt.colorbar()
plt.title('Winter')
# Improve the spacing between subplots and display them
plt.tight_layout()
plt.show()
图像的文章就是代码多,因为要描述图片。因此文章长,但是实际上,内容不多的。 看了下四种图都不怎么好看,就是比较science而已。
Using hist2d() | Python
histogram图这里涉及的是二维的,形容三个维度的图,\((x,y,count(x,y))\)。
涉及的主要参数,
plt.hist2d(x,y)
,
bins=(nx, ny)
,
range=((xmin, xmax), (ymin, ymax))
。
参数含义非常好理解。
# Generate a 2-D histogram
plt.hist2d(
hp, mpg,
bins = (20,20),
range=((40,235),(8,48))
)
# Add a color bar to the histogram
plt.colorbar()
# Add labels, title, and display the plot
plt.xlabel('Horse power [hp]')
plt.ylabel('Miles per gallon [mpg]')
plt.title('hist2d() plot')
plt.show()
Using hexbin() | Python
gridsize
的默认值为100,表示x
和y
轴上的方块数量。
extent=(xmin, xmax, ymin, ymax)
显然不用解释。
# Generate a 2d histogram with hexagonal bins
plt.hexbin(
hp,mpg,
gridsize = (15,12),
extent = (40,235,8,48)
)
# Add a color bar to the histogram
plt.colorbar()
# Add labels, title, and display the plot
plt.xlabel('Horse power [hp]')
plt.ylabel('Miles per gallon [mpg]')
plt.title('hexbin() plot')
plt.show()
Working with images | Python
已经到要修改image的程度了,很好玩,但是工作上没什么用啊,差评。
接下来的练习可以跳过,因为修改image跟数据呈现关联度不大。
当然如果觉得好玩,也可以继续看看。
Loading, examining images | Python
红、绿、蓝是三原色,构图那个,不是光的那个。
因此图片任意一个点,可以用\([red\%,green\%m,blue\%]\)来描述。
因此图片转化成np.array
后,其实就是个\(M \times N\)的矩阵,单位是\([red\%,green\%m,blue\%]\)。
plt.axis('off')
不展示横纵坐标。
plt.imread('480px-Astronaut-EVA.jpg')
转化.jpg
为矩阵。
plt.imshow(img)
转化矩阵成图片。
# Load the image into an array: img
img = plt.imread('480px-Astronaut-EVA.jpg')
# Print the shape of the image
print(img.shape)
# Display the image
plt.imshow(img)
# Hide the axes
plt.axis('off')
plt.show()
Pseudocolor plot from image data | Python
这里.sum(axis=2)
表示一个单元的相加,\([red\%,green\%m,blue\%]\)。
# Load the image into an array: img
img = plt.imread('480px-Astronaut-EVA.jpg')
# Print the shape of the image
print(img.shape)
# Compute the sum of the red, green and blue channels: intensity
intensity = img.sum(axis=2)
# Print the shape of the intensity
print(intensity.shape)
# Display the intensity with a colormap of 'gray'
plt.imshow(intensity, cmap='gray')
# Add a colorbar
plt.colorbar()
# Hide the axes and show the figure
plt.axis('off')
plt.show()
Extent and aspect | Python
\(aspect = \frac{height}{width}\)
aspect
是纵横比,\(\frac{纵}{横}(比)\)。
extent = (xmin,xmax,ymin,ymax)
显然,不用解释。
# Load the image into an array: img
img = plt.imread('480px-Astronaut-EVA.jpg')
# Specify the extent and aspect ratio of the top left subplot
plt.subplot(2,2,1)
plt.title('extent=(-1,1,-1,1),\naspect=0.5')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent =(-1,1,-1,1), aspect=0.5)
# Specify the extent and aspect ratio of the top right subplot
plt.subplot(2,2,2)
plt.title('extent=(-1,1,-1,1),\naspect=1')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent =(-1,1,-1,1), aspect=1)
# Specify the extent and aspect ratio of the bottom left subplot
plt.subplot(2,2,3)
plt.title('extent=(-1,1,-1,1),\naspect=2')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent =(-1,1,-1,1), aspect=2)
# Specify the extent and aspect ratio of the bottom right subplot
plt.subplot(2,2,4)
plt.title('extent=(-2,2,-1,1),\naspect=2')
plt.xticks([-2,-1,0,1,2])
plt.yticks([-1,0,1])
plt.imshow(img, extent =(-2,2,-1,1), aspect=2)
# Improve spacing and display the figure
plt.tight_layout()
plt.show()
Rescaling pixel intensities | Python
\(intensities = [red\%,green\%m,blue\%]\)这个是之前就定义过的。
# Load the image into an array: image
image = plt.imread('640px-Unequalized_Hawkes_Bay_NZ.jpg')
# Extract minimum and maximum values from the image: pmin, pmax
pmin, pmax = image.min(), image.max()
print("The smallest & largest pixel intensities are %d & %d." % (pmin, pmax))
# Rescale the pixels: rescaled_image
rescaled_image = 256*(image - pmin) / (pmax - pmin)
print("The rescaled smallest & largest pixel intensities are %.1f & %.1f." %
(rescaled_image.min(), rescaled_image.max()))
# Display the original image in the top subplot
plt.subplot(2,1,1)
plt.title('original image')
plt.axis('off')
plt.imshow(image)
# Display the rescaled image in the bottom subplot
plt.subplot(2,1,2)
plt.title('rescaled image')
plt.axis('off')
plt.imshow(rescaled_image)
plt.show()
经过了标准化处理
\[\tilde x =\frac{x-\max(x)}{\max(x)-\min(x)}\]
感觉骚操作,但是没有感觉有啥好处?
Visualizing regressions | Python
hue='sex'
针对factor变量。
palette='Set1'
4针对调色。
col='sex'
针对subplot。
sns.residplot()
针对残差画图。
Higher-order regressions | Python
order=1
是默认的,因此不需要加。
In [4]: help(sns.regplot)
Help on function regplot in module seaborn.linearmodels:
regplot(x, y, data=None, x_estimator=None, x_bins=None, x_ci='ci', scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, dropna=True, x_jitter=None, y_jitter=None, label=None, color=None, marker='o', scatter_kws=None, line_kws=None, ax=None)
plt.legend(loc = 'upper right')
中loc =
别忘了。
loc : int or string or pair of floats, default: ‘upper right’
且'upper right'
是默认的。
Grouping linear regressions by row or column | Python
overlaying linear regressions of grouped data in the same plot, we may want to use a grid of subplots.
重叠方程用hue
,网格图用row
和col
,来确定网格的排列方式。
和R很像。
R的ggplot2
网格用facet.grid(a ~ b)
中的a
和b
来确定。
Visualizing univariate distributions | Python
jitter
的作用,使得strip图转为swarm图,实际上是在hjust
上加了比重。这个图在ggplot2
中的geom_jitter()
中使用。
改变方向使用orient = 'h'
或者orient = 'v'
等于Rggplot2
中的coord_flip
5
明显感觉violin plot比box plot好。
但是violin plot不能体现outlier。
可以通过violin plot和swarm plot的合并完成,
.violinplot(inner = None)
和.stripplot(jitter = True)
。
Constructing violin plots | Python
# Generate a violin plot of 'hp' grouped horizontally by 'cyl'
plt.subplot(2,1,1)
sns.violinplot(y='hp', x='cyl', data=auto)
# Generate the same violin plot again with a color of 'lightgray' and without inner annotations
plt.subplot(2,1,2)
sns.violinplot(y='hp', x='cyl', data=auto, inner=None, color = 'lightgray')
# Overlay a strip plot on the violin plot
sns.stripplot(y='hp', x='cyl', data=auto, size = 1.5, jitter = True)
# Display the plot
plt.show()
Visualizing multivariate distributions | Python
joint plot我在Rggplot2
中也用得少啊。
体现了两个连续变量的分布、散点图、相关系数、相关系数相对于0的显著水平。
kind = kde
让图像变得smooth。
pair plot6是\(N \times N\)个图,其中对角线上是某个变量的分布histogram图,其他都是散点图。
其中可以加入hue
,看组间差异。
sns.pairplot(auto)
这是个例子。
其中kind = 'scatter'
是默认的,kind = 'reg'
可以specify。
heat map主要看相关性。
Plotting joint distributions (2) | Python
在sns.jointplot()
中kind
有几种重要的选择,
- kind=‘scatter’ uses a scatter plot of the data points
- kind=‘reg’ uses a regression plot (default order 1)
- kind=‘resid’ uses a residual plot
- kind=‘kde’ uses a kernel density estimate of the joint distribution
- kind=‘hex’ uses a hexbin plot of the joint distribution
Visualizing correlations with a heatmap | Python
# Print the covariance matrix
print(cov_matrix)
# Visualize the covariance matrix using a heatmap
sns.heatmap(cov_matrix)
# Display the heatmap
plt.show()
.heatmap()
的对象是一个matrix不是table。
Visualizing time series | Python
plt.xticks(rotation = 60)
这个类似于R中的
theme(axis.text.x = element_text(angle = 70, hjust = 1))
。
.index(::96)
这个为什么就是每隔四天了。
Multiple time series on common axes | Python
有四种颜色7。
# Import matplotlib.pyplot
import matplotlib.pyplot as plt
# Plot the aapl time series in blue
plt.plot(aapl, color='blue', label='AAPL')
# Plot the ibm time series in green
plt.plot(ibm, color='green', label='IBM')
# Plot the csco time series in red
plt.plot(csco, color='red', label='CSCO')
# Plot the msft time series in magenta
plt.plot(msft, color='magenta', label='MSFT')
# Add a legend in the top left corner of the plot
plt.legend(loc='upper left')
# Specify the orientation of the xticks
plt.xticks(rotation = 60)
# Display the plot
plt.show()
Time series with moving windows | Python
moving windows有好几种方法
- Averages
- Medians
- Standard deviations
# Plot the 30-day moving average in the top left subplot in green
plt.subplot(2,2,1)
plt.plot(mean_30, color = 'green')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('30d averages')
# Plot the 75-day moving average in the top right subplot in red
plt.subplot(2,2,2)
plt.plot(mean_75, 'red')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('75d averages')
# Plot the 125-day moving average in the bottom left subplot in magenta
plt.subplot(2, 2, 3)
plt.plot(mean_125, 'magenta')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('125d averages')
# Plot the 250-day moving average in the bottom right subplot in cyan
plt.subplot(2,2,4)
plt.plot(mean_250, 'cyan')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('250d averages')
# Display the plot
plt.show()
Histogram equalization in images | Python
.flatten()
是numpy.ndarray.flatten的一个函数,其官方文档:
ndarray.flatten(order='C')
Return a copy of the array collapsed into one dimension.
但是该函数只能适用于numpy对象,即array或者mat,普通的list列表是不行的。 python numpy库中flatten()函数用法 - taotiezhengfeng的博客 - CSDN博客
为什么这里可以使用.flatten()
呢?
An image histogram, then, is computed by counting the occurences of distinct pixel intensities over all the pixels in the image.
因为我们要看histogram,因此就是对一个一维的数据进行分析, 因此当然要将一个矩阵的\(intensities = [red\%,green\%m,blue\%]\)合成, 然后将矩阵转化为一个行向量,即一维向量。
Extracting a histogram from a grayscale image | Python
# Load the image into an array: image
image = plt.imread('640px-Unequalized_Hawkes_Bay_NZ.jpg')
# Display image in top subplot using color map 'gray'
plt.subplot(2,1,1)
plt.title('Original image')
plt.axis('off')
plt.imshow(image, cmap = 'gray')
# Flatten the image into 1 dimension: pixels
pixels = image.flatten()
# Display a histogram of the pixels in the bottom subplot
plt.subplot(2,1,2)
plt.xlim((0,255))
plt.title('Normalized histogram')
plt.hist(
pixels,
bins=64,
color='red',
alpha=0.4,
range=(0,256) ,
normed=True)
# Display the plot
plt.show()
Cumulative Distribution Function from an image histogram | Python
The command plt.twinx()
allows two plots to be overlayed sharing the x-axis but with different scales on the y-axis.
# Load the image into an array: image
image = plt.imread('640px-Unequalized_Hawkes_Bay_NZ.jpg')
# Display image in top subplot using color map 'gray'
plt.subplot(2,1,1)
plt.imshow(image, cmap='gray')
plt.title('Original image')
plt.axis('off')
# Flatten the image into 1 dimension: pixels
pixels = image.flatten()
# Display a histogram of the pixels in the bottom subplot
plt.subplot(2,1,2)
pdf = plt.hist(pixels, bins=64, range=(0,256), normed=False,
color='red', alpha=0.4)
plt.grid('off')
# Use plt.twinx() to overlay the CDF in the bottom subplot
plt.twinx()
# Display a cumulative histogram of the pixels
cdf = plt.hist(pixels, bins=64, range=(0,256),
normed=True, cumulative=True,
color='blue', alpha=0.4)
# Specify x-axis range, hide axes, add title and display plot
plt.xlim((0,256))
plt.grid('off')
plt.title('PDF & CDF (original image)')
plt.show()
Equalizing an image histogram | Python
The basic idea is to use interpolation to map the original CDF of pixel intensities to a CDF that is almost a straight line. In essence, the pixel intensities are spread out and this has the practical effect of making a sharper, contrast-enhanced image. This is particularly useful in astronomy and medical imaging to help us see more features.
归一化让分布函数形成一条直线,且图片看起来更加鲜明对比。
# Load the image into an array: image
image = plt.imread('640px-Unequalized_Hawkes_Bay_NZ.jpg')
# Flatten the image into 1 dimension: pixels
pixels = image.flatten()
# Generate a cumulative histogram
cdf, bins, patches = plt.hist(pixels, bins=256, range=(0,256), normed=True, cumulative=True)
new_pixels = np.interp(pixels, bins[:-1], cdf*255)
# Reshape new_pixels as a 2-D array: new_image
new_image = new_pixels.reshape(image.shape)
# Display the new image with 'gray' color map
plt.subplot(2,1,1)
plt.title('Equalized image')
plt.axis('off')
plt.imshow(new_image, cmap='gray')
# Generate a histogram of the new pixels
plt.subplot(2,1,2)
pdf = plt.hist(new_pixels, bins=64, range=(0,256), normed=False,
color='red', alpha=0.4)
plt.grid('off')
# Use plt.twinx() to overlay the CDF in the bottom subplot
plt.twinx()
plt.xlim((0,256))
plt.grid('off')
# Add title
plt.title('PDF & CDF (equalized image)')
# Generate a cumulative histogram of the new pixels
cdf = plt.hist(new_pixels, bins=64, range=(0,256),
cumulative=True, normed=True,
color='blue', alpha=0.4)
plt.show()
Extracting histograms from a color image | Python
# Load the image into an array: image
image = plt.imread('hs-2004-32-b-small_web.jpg')
# Display image in top subplot
plt.subplot(2,1,1)
plt.title('Original image')
plt.axis('off')
plt.imshow(image)
# Extract 2-D arrays of the RGB channels: red, blue, green
red, green, blue = image[:,:,0], image[:,:,1], image[:,:,2]
# Flatten the 2-D arrays of the RGB channels into 1-D
red_pixels = red.flatten()
blue_pixels = green.flatten().flatten()
green_pixels =blue.flatten()
# Overlay histograms of the pixels of each color in the bottom subplot
plt.subplot(2,1,2)
plt.title('Histograms from color image')
plt.xlim((0,256))
plt.hist(red_pixels, bins=64, normed=True, color='red', alpha=0.2)
plt.hist(blue_pixels, bins=64, normed=True, color='blue', alpha=0.2)
plt.hist(green_pixels, bins=64, normed=True, color='green', alpha=0.2)
# Display the plot
plt.show()
东西真多,好心烦。
Extracting bivariate histograms from a color image | Python
# Load the image into an array: image
image = plt.imread('hs-2004-32-b-small_web.jpg')
# Extract RGB channels and flatten into 1-D array
red, blue, green = image[:,:,0], image[:,:,1], image[:,:,2]
red_pixels = red.flatten()
blue_pixels = blue.flatten()
green_pixels = green.flatten()
# Generate a 2-D histogram of the red and green pixels
plt.subplot(2,2,1)
plt.grid('off')
plt.xticks(rotation=60)
plt.xlabel('red')
plt.ylabel('green')
plt.hist2d(red_pixels,green_pixels,bins=(32,32))
# Generate a 2-D histogram of the green and blue pixels
plt.subplot(2,2,2)
plt.grid('off')
plt.xticks(rotation=60)
plt.xlabel('green')
plt.ylabel('blue')
plt.hist2d(green_pixels,blue_pixels,bins=(32,32))
# Generate a 2-D histogram of the blue and red pixels
plt.subplot(2,2,3)
plt.grid('off')
plt.xticks(rotation=60)
plt.xlabel('blue')
plt.ylabel('red')
plt.hist2d(blue_pixels,red_pixels,bins=(32,32))
# Display the plot
plt.show()