Commit f7d08a7

authored

DOC: Use TemporaryDirectory in scale.rst (#62416)

1 parent a072875 commit f7d08a7Copy full SHA for f7d08a7

File tree

1 file changed

+23

-18

lines changed

doc/source/user_guide
- scale.rst

1 file changed

+23

-18

lines changed

`‎doc/source/user_guide/scale.rst‎`

Lines changed: 23 additions & 18 deletions

Original file line number	Diff line number	Diff line change
`@@ -164,35 +164,35 @@ files. Each file in the directory represents a different year of the entire data`
`164`	`164`	`.. ipython:: python`
`165`	`165`	`:okwarning:`
`166`	`166`
`167`		`- import pathlib`
	`167`	`+ import glob`
	`168`	`+ import tempfile`
`168`	`169`
`169`	`170`	`N = 12`
`170`	`171`	`starts = [f"20{i:>02d}-01-01" for i in range(N)]`
`171`	`172`	`ends = [f"20{i:>02d}-12-13" for i in range(N)]`
`172`	`173`
`173`		`- pathlib.Path("data/timeseries").mkdir(exist_ok=True)`
	`174`	`+ tmpdir = tempfile.TemporaryDirectory(ignore_cleanup_errors=True)`
`174`	`175`
`175`	`176`	`for i, (start, end) in enumerate(zip(starts, ends)):`
`176`	`177`	`ts = make_timeseries(start=start, end=end, freq="1min", seed=i)`
`177`		`- ts.to_parquet(f"data/timeseries/ts-{i:0>2d}.parquet")`
	`178`	`+ ts.to_parquet(f"{tmpdir.name}/ts-{i:0>2d}.parquet")`
`178`	`179`
`179`	`180`
`180`	`181`	`::`
`181`	`182`
`182`		`- data`
`183`		`- └── timeseries`
`184`		`- ├── ts-00.parquet`
`185`		`- ├── ts-01.parquet`
`186`		`- ├── ts-02.parquet`
`187`		`- ├── ts-03.parquet`
`188`		`- ├── ts-04.parquet`
`189`		`- ├── ts-05.parquet`
`190`		`- ├── ts-06.parquet`
`191`		`- ├── ts-07.parquet`
`192`		`- ├── ts-08.parquet`
`193`		`- ├── ts-09.parquet`
`194`		`- ├── ts-10.parquet`
`195`		`- └── ts-11.parquet`
	`183`	`+ tmpdir`
	`184`	`+ ├── ts-00.parquet`
	`185`	`+ ├── ts-01.parquet`
	`186`	`+ ├── ts-02.parquet`
	`187`	`+ ├── ts-03.parquet`
	`188`	`+ ├── ts-04.parquet`
	`189`	`+ ├── ts-05.parquet`
	`190`	`+ ├── ts-06.parquet`
	`191`	`+ ├── ts-07.parquet`
	`192`	`+ ├── ts-08.parquet`
	`193`	`+ ├── ts-09.parquet`
	`194`	`+ ├── ts-10.parquet`
	`195`	`+ └── ts-11.parquet`
`196`	`196`
`197`	`197`	Now we'll implement an out-of-core :meth:`pandas.Series.value_counts`. The peak memory usage of this
`198`	`198`	`workflow is the single largest chunk, plus a small series storing the unique value`
`@@ -202,13 +202,18 @@ work for arbitrary-sized datasets.`
`202`	`202`	`.. ipython:: python`
`203`	`203`
`204`	`204`	`%%time`
`205`		`- files = pathlib.Path("data/timeseries/").glob("ts*.parquet")`
	`205`	`+ files = glob.iglob(f"{tmpdir.name}/ts*.parquet")`
`206`	`206`	`counts = pd.Series(dtype=int)`
`207`	`207`	`for path in files:`
`208`	`208`	`df = pd.read_parquet(path)`
`209`	`209`	`counts = counts.add(df["name"].value_counts(), fill_value=0)`
`210`	`210`	`counts.astype(int)`
`211`	`211`
	`212`	`+.. ipython:: python`
	`213`	`+ :suppress:`
	`214`	`+`
	`215`	`+ tmpdir.cleanup()`
	`216`	`+`
`212`	`217`	Some readers, like :meth:`pandas.read_csv`, offer parameters to control the
`213`	`218`	``chunksize`` when reading a single file.
`214`	`219`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit f7d08a7

File tree

1 file changed

1 file changed

`‎doc/source/user_guide/scale.rst‎`

0 commit comments