Commit 9ebfcdf

committed

Update 07-常用函数.md

1 parent c5f159c commit 9ebfcdfCopy full SHA for 9ebfcdf

File tree

+34

-0

lines changed

+34

-0

lines changed

Lines changed: 34 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -14,6 +14,8 @@`
`14`	`14`	`- [json解析](#json解析)`
`15`	`15`	`- [日期格式清洗](#日期格式清洗)`
`16`	`16`	`- [众数占比](#众数占比)`
	`17`	`+ - [计算非空唯一值](#计算非空唯一值)`
	`18`	`+ - [分层抽样](#分层抽样)`
`17`	`19`
`18`	`20`	`<br/>`
`19`	`21`
`@@ -314,6 +316,38 @@ def uni_cnt(list_x):`
`314`	`316`	`a = pd.pivot_table(df, index=['Month'], values=['id'], aggfunc=nui_cnt)`
`315`	`317`	```
`316`	`318`
	`319`	`+<br/>`
	`320`	`+`
	`321`	`+------`
	`322`	`+`
	`323`	`+### 分层抽样`
	`324`	`+`
	`325`	+```python
	`326`	`+def get_sample(df, k, stratified_col):`
	`327`	`+ import math`
	`328`	`+ import random`
	`329`	`+ random.seed(10)`
	`330`	`+ grouped = df.groupby(by=stratified_col)[stratified_col[0]].count()`
	`331`	`+ group_k = grouped.map(lambda x: math.ceil(x * k))`
	`332`	`+`
	`333`	`+ res_df = pd.DataFrame(columns=df.columns)`
	`334`	`+ for df_idx in group_k.index:`
	`335`	`+ df1=df`
	`336`	`+ if len(stratified_col)==1:`
	`337`	`+ df1=df1[df1[stratified_col[0]]==df_idx]`
	`338`	`+ else:`
	`339`	`+ for i in range(len(df_idx)):`
	`340`	`+ df1=df1[df1[stratified_col[i]]==df_idx[i]]`
	`341`	`+ idx = random.sample(range(len(df1)), group_k[df_idx])`
	`342`	`+ group_df = df1.iloc[idx,:].copy()`
	`343`	`+ res_df = res_df.append(group_df)`
	`344`	`+ return res_df`
	`345`	`+`
	`346`	`+df_stratified = get_sample(df, k=0.1, stratified_col=['month'])`
	`347`	+```
	`348`	`+`
	`349`	`+`
	`350`	`+`
`317`	`351`
`318`	`352`
`319`	`353`	`\| [< <目录](./README.md) \| [返回顶部 ↑](#07-常用函数) \|`

Comments

(0)