1 min read

pandas debugging 学习笔记

pd.concat

pd.concat(...,axis = 1) 其中对象是一个列。

import pandas as pd
pd.concat?
objs : a sequence or mapping of Series, DataFrame, or Panel objects
    If a dict is passed, the sorted keys will be used as the `keys`
    argument, unless it is passed, in which case the values will be
    selected (see below). Any None objects will be dropped silently unless
    they are all None in which case a ValueError will be raised

注意看这里,a sequence or mapping,因此是一列,而非一个对象。

pickle文件

使用pickle文件,df = pd.read_pickledf.to_pickle

引用和复制

  • copy

    z[~z.x.isin([2])].y = 1

  • indexing view

    z.loc[~z.x.isin([2]),“y”] = 1

不推荐前一种,因为这两种分别对应

z.__getitem__(~z.x.isin([2])).__setitem__(y, value)

z.__setitem__([~z.x.isin([2]),"y"], value)
  • 前面一种是复制数据,z变化,后面的结果不会变化。
  • 后面一种是引用,因此z变化,后面的结果都会变化。 思路参考 pandas 0.23.0 documentation

更多可以参考pandas的引用与复制 - CSDN博客,有7个例子加深理解。

#.tz_localize报错

pd.to_datetime('now').tz_localize('Asia/Shanghai')

报错

TypeError: ufunc subtract cannot use operands with types dtype('O') and dtype('<M8[ns]')

解决办法

pd.to_datetime('now') +  pd.Timedelta(hours=8)

#.isnotin类功能 [@曹骥2017]

z[~z.x.isin([2])]
z[z.x.isin([2]) == False]

neat way pivot pandas avoid mutiple index

pivot(index='userid', columns='delta_day_group',values='lgcnt')


ods_tbloginlogby2018.columns = ['lgcnt_7_0','lgcnt_30_7','lgcnt_90_30']