Commit acd4042

authored

Update readme.md

1 parent e334c85 commit acd4042Copy full SHA for acd4042

File tree

1 file changed

+91

-127

lines changed

readme.md

1 file changed

+91

-127

lines changed

`‎readme.md‎`

Lines changed: 91 additions & 127 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,34 +1,36 @@`
`1`		`-# Python Cache: How to Speed Up Your Code with Effective Caching`
	`1`	`+# Python Cache: How to Speed Up Your Code with Effective Caching`
`2`	`2`
`3`	`3`	`This article will show you how to use caching in Python with your web`
`4`	`4`	`scraping tasks. You can read the [<u>full`
`5`	`5`	`article</u>](https://oxylabs.io/blog/python-cache-how-to-use-effectively)`
`6`	`6`	`on our blog, where we delve deeper into the different caching`
`7`	`7`	`strategies.`
`8`	`8`
`9`		`-## How to implement a cache in Python`
	`9`	`+## How to implement a cache in Python`
`10`	`10`
`11`	`11`	`There are different ways to implement caching in Python for different`
`12`	`12`	`caching strategies. Here we’ll see two methods of Python caching for a`
`13`	`13`	`simple web scraping example. If you’re new to web scraping, take a look`
`14`	`14`	`at our [<u>step-by-step Python web scraping`
`15`	`15`	`guide</u>](https://oxylabs.io/blog/python-web-scraping).`
`16`	`16`
`17`		`-### Install the required libraries`
	`17`	`+### Install the required libraries`
`18`	`18`
`19`	`19`	`We’ll use the [<u>requests`
`20`	`20`	`library</u>](https://pypi.org/project/requests/) to make HTTP requests`
`21`	`21`	`to a website. Install it with`
`22`	`22`	`[<u>pip</u>](https://pypi.org/project/pip/) by entering the following`
`23`	`23`	`command in your terminal:`
`24`	`24`
	`25`	+```bash
`25`	`26`	`python -m pip install requests`
	`27`	+```
`26`	`28`
`27`		`-Other libraries we’ll use in this project, specifically time and`
`28`		`-functools, come natively with Python 3.11.2, so you don’t have to`
	`29`	+Other libraries we’ll use in this project, specifically `time` and
	`30`	+`functools`, come natively with Python 3.11.2, so you don’t have to
`29`	`31`	`install them.`
`30`	`32`
`31`		`-### Method 1: Python caching using a manual decorator`
	`33`	`+### Method 1: Python caching using a manual decorator`
`32`	`34`
`33`	`35`	`A [<u>decorator</u>](https://peps.python.org/pep-0318/) in Python is a`
`34`	`36`	`function that accepts another function as an argument and outputs a new`
`@@ -42,143 +44,116 @@ saving them in the cache for future use.`
`42`	`44`	`Let’s start by creating a simple function that takes a URL as a function`
`43`	`45`	`argument, requests that URL, and returns the response text:`
`44`	`46`
	`47`	+```python
`45`	`48`	`def get_html_data(url):`
`46`		`-`
`47`		`-response = requests.get(url)`
`48`		`-`
`49`		`-return response.text`
	`49`	`+ response = requests.get(url)`
	`50`	`+ return response.text`
	`51`	+```
`50`	`52`
`51`	`53`	`Now, let's move toward creating a memoized version of this function:`
`52`	`54`
	`55`	+```python
`53`	`56`	`def memoize(func):`
	`57`	`+ cache = {}`
`54`	`58`
`55`		`-cache = {}`
`56`		`-`
`57`		`-def wrapper(\*args):`
`58`		`-`
`59`		`-if args in cache:`
`60`		`-`
`61`		`-return cache\[args\]`
`62`		`-`
`63`		`-else:`
`64`		`-`
`65`		`-result = func(\*args)`
	`59`	`+ def wrapper(*args):`
	`60`	`+ if args in cache:`
	`61`	`+ return cache[args]`
	`62`	`+ else:`
	`63`	`+ result = func(*args)`
	`64`	`+ cache[args] = result`
	`65`	`+ return result`
`66`	`66`
`67`		`-cache\[args\] = result`
	`67`	`+return wrapper`
`68`	`68`
`69`		`-return result`
`70`		`-`
`71`		`-return wrapper`
`72`	`69`
`73`	`70`	`@memoize`
`74`		`-`
`75`	`71`	`def get_html_data_cached(url):`
	`72`	`+ response = requests.get(url)`
	`73`	`+ return response.text`
	`74`	+```
`76`	`75`
`77`		`-response = requests.get(url)`
`78`		`-`
`79`		`-return response.text`
`80`		`-`
`81`		`-The wrapper function determines whether the current input arguments have`
	`76`	+The `wrapper` function determines whether the current input arguments have
`82`	`77`	`been previously cached and, if so, returns the previously cached result.`
`83`	`78`	`If not, the code calls the original function and caches the result`
`84`		`-before being returned. In this case, we define a memoize decorator that`
`85`		`-generates a cache dictionary to hold the results of previous function`
	`79`	+before being returned. In this case, we define a `memoize` decorator that
	`80`	+generates a `cache` dictionary to hold the results of previous function
`86`	`81`	`calls.`
`87`	`82`
`88`		`-By adding @memoize above the function definition, we can use the memoize`
`89`		`-decorator to enhance the get_html_data function. This generates a new`
`90`		`-memoized function that we’ve called get_html_data_cached. It only makes`
	`83`	+By adding `@memoize` above the function definition, we can use the memoize
	`84`	+decorator to enhance the `get_html_data` function. This generates a new
	`85`	+memoized function that we’ve called `get_html_data_cached`. It only makes
`91`	`86`	`a single network request for a URL and then stores the response in the`
`92`	`87`	`cache for further requests.`
`93`	`88`
`94`		`-Let’s use the time module to compare the execution speeds of the`
`95`		`-get_html_data function and the memoized get_html_data_cached function:`
	`89`	+Let’s use the `time` module to compare the execution speeds of the
	`90`	+`get_html_data` function and the memoized `get_html_data_cached` function:
`96`	`91`
	`92`	+```python
`97`	`93`	`import time`
`98`	`94`
`99`		`-start_time = time.time()`
`100`	`95`
	`96`	`+start_time = time.time()`
`101`	`97`	`get_html_data('https://books.toscrape.com/')`
`102`		`-`
`103`	`98`	`print('Time taken (normal function):', time.time() - start_time)`
`104`	`99`
`105`		`-start_time = time.time()`
`106`	`100`
	`101`	`+start_time = time.time()`
`107`	`102`	`get_html_data_cached('https://books.toscrape.com/')`
`108`		`-`
`109`		`-print('Time taken (memoized function using manual decorator):',`
`110`		`-time.time() - start_time)`
	`103`	`+print('Time taken (memoized function using manual decorator):', time.time() - start_time)`
	`104`	+```
`111`	`105`
`112`	`106`	`Here’s what the complete code looks like:`
`113`	`107`
`114`		`-\# Import the required modules`
`115`		`-`
	`108`	+```python
	`109`	`+# Import the required modules`
`116`	`110`	`from functools import lru_cache`
`117`		`-`
`118`	`111`	`import time`
`119`		`-`
`120`	`112`	`import requests`
`121`	`113`
`122`		`-\# Function to get the HTML Content`
`123`	`114`
	`115`	`+# Function to get the HTML Content`
`124`	`116`	`def get_html_data(url):`
	`117`	`+ response = requests.get(url)`
	`118`	`+ return response.text`
`125`	`119`
`126`		`-response = requests.get(url)`
`127`		`-`
`128`		`-return response.text`
`129`		`-`
`130`		`-\# Memoize function to cache the data`
`131`	`120`
	`121`	`+# Memoize function to cache the data`
`132`	`122`	`def memoize(func):`
	`123`	`+ cache = {}`
`133`	`124`
`134`		`-cache = {}`
`135`		`-`
`136`		`-\# Inner wrapper function to store the data in the cache`
`137`		`-`
`138`		`-def wrapper(\*args):`
`139`		`-`
`140`		`-if args in cache:`
`141`		`-`
`142`		`-return cache\[args\]`
`143`		`-`
`144`		`-else:`
`145`		`-`
`146`		`-result = func(\*args)`
	`125`	`+ # Inner wrapper function to store the data in the cache`
	`126`	`+ def wrapper(*args):`
	`127`	`+ if args in cache:`
	`128`	`+ return cache[args]`
	`129`	`+ else:`
	`130`	`+ result = func(*args)`
	`131`	`+ cache[args] = result`
	`132`	`+ return result`
`147`	`133`
`148`		`-cache\[args\] = result`
	`134`	`+return wrapper`
`149`	`135`
`150`		`-return result`
`151`		`-`
`152`		`-return wrapper`
`153`		`-`
`154`		`-\# Memoized function to get the HTML Content`
`155`	`136`
	`137`	`+# Memoized function to get the HTML Content`
`156`	`138`	`@memoize`
`157`		`-`
`158`	`139`	`def get_html_data_cached(url):`
	`140`	`+ response = requests.get(url)`
	`141`	`+ return response.text`
`159`	`142`
`160`		`-response = requests.get(url)`
`161`		`-`
`162`		`-return response.text`
`163`		`-`
`164`		`-\# Get the time it took for a normal function`
`165`	`143`
	`144`	`+# Get the time it took for a normal function`
`166`	`145`	`start_time = time.time()`
`167`		`-`
`168`	`146`	`get_html_data('https://books.toscrape.com/')`
`169`		`-`
`170`	`147`	`print('Time taken (normal function):', time.time() - start_time)`
`171`	`148`
`172`		`-\# Get the time it took for a memoized function (manual decorator)`
`173`		`-`
	`149`	`+# Get the time it took for a memoized function (manual decorator)`
`174`	`150`	`start_time = time.time()`
`175`		`-`
`176`	`151`	`get_html_data_cached('https://books.toscrape.com/')`
	`152`	`+print('Time taken (memoized function using manual decorator):', time.time() - start_time)`
	`153`	+```
`177`	`154`
`178`		`-print('Time taken (memoized function using manual decorator):',`
`179`		`-time.time() - start_time)`
`180`		`-`
`181`		`-Here’s the output:`
	`155`	`+And here’s the output:`
	`156`	`+![](images/output_normal_memoized.png)`
`182`	`157`
`183`	`158`	`Notice the time difference between the two functions. Both take almost`
`184`	`159`	`the same time, but the supremacy of caching lies behind the re-access.`
`@@ -190,82 +165,70 @@ increase the number of calls to these functions, the time difference`
`190`	`165`	`will significantly increase (see [<u>Performance`
`191`	`166`	`Comparison</u>](#performance-comparison)).`
`192`	`167`
`193`		`-### Method 2: Python caching using LRU cache decorator`
	`168`	`+### Method 2: Python caching using LRU cache decorator`
`194`	`169`
`195`	`170`	`Another method to implement caching in Python is to use the built-in`
`196`		`-@lru_cache decorator from functools. This decorator implements cache`
	`171`	+`@lru_cache` decorator from `functools`. This decorator implements cache
`197`	`172`	`using the least recently used (LRU) caching strategy. This LRU cache is`
`198`	`173`	`a fixed-size cache, which means it’ll discard the data from the cache`
`199`	`174`	`that hasn’t been used recently.`
`200`	`175`
`201`		`-To use the @lru_cache decorator, we can create a new function for`
	`176`	+To use the `@lru_cache` decorator, we can create a new function for
`202`	`177`	`extracting HTML content and place the decorator name at the top. Make`
`203`		`-sure to import the functools module before using the decorator:`
	`178`	+sure to import the `functools` module before using the decorator:
`204`	`179`
	`180`	+```python
`205`	`181`	`from functools import lru_cache`
`206`	`182`
`207`		`-@lru_cache(maxsize=None)`
`208`	`183`
	`184`	`+@lru_cache(maxsize=None)`
`209`	`185`	`def get_html_data_lru(url):`
	`186`	`+ response = requests.get(url)`
	`187`	`+ return response.text`
	`188`	+```
`210`	`189`
`211`		`-response = requests.get(url)`
`212`		`-`
`213`		`-return response.text`
`214`		`-`
`215`		`-In the above example, the get_html_data_lru method is memoized using the`
`216`		`-@lru_cache decorator. The cache can grow indefinitely when the maxsize`
`217`		`-option is set to None.`
	`190`	+In the above example, the `get_html_data_lru` method is memoized using the
	`191`	+`@lru_cache` decorator. The cache can grow indefinitely when the `maxsize`
	`192`	+option is set to `None`.
`218`	`193`
`219`		`-To use the @lru_cache decorator, just add it above the get_html_data_lru`
	`194`	+To use the `@lru_cache` decorator, just add it above the `get_html_data_lru`
`220`	`195`	`function. Here’s the complete code sample:`
`221`	`196`
`222`		`-\# Import the required modules`
`223`		`-`
	`197`	+```python
	`198`	`+# Import the required modules`
`224`	`199`	`from functools import lru_cache`
`225`		`-`
`226`	`200`	`import time`
`227`		`-`
`228`	`201`	`import requests`
`229`	`202`
`230`		`-\# Function for getting HTML Content`
`231`	`203`
	`204`	`+# Function to get the HTML Content`
`232`	`205`	`def get_html_data(url):`
	`206`	`+ response = requests.get(url)`
	`207`	`+ return response.text`
`233`	`208`
`234`		`-response = requests.get(url)`
`235`		`-`
`236`		`-return response.text`
`237`		`-`
`238`		`-\# Memoized using LRU Cache`
`239`	`209`
	`210`	`+# Memoized using LRU Cache`
`240`	`211`	`@lru_cache(maxsize=None)`
`241`		`-`
`242`	`212`	`def get_html_data_lru(url):`
	`213`	`+ response = requests.get(url)`
	`214`	`+ return response.text`
`243`	`215`
`244`		`-response = requests.get(url)`
`245`		`-`
`246`		`-return response.text`
`247`		`-`
`248`		`-\# Getting time for Normal function to extract HTML content`
`249`	`216`
	`217`	`+# Get the time it took for a normal function`
`250`	`218`	`start_time = time.time()`
`251`		`-`
`252`	`219`	`get_html_data('https://books.toscrape.com/')`
`253`		`-`
`254`	`220`	`print('Time taken (normal function):', time.time() - start_time)`
`255`	`221`
`256`		`-\# Getting time for Memoized function (LRU cache) to extract HTML`
`257`		`-content`
`258`		`-`
	`222`	`+# Get the time it took for a memoized function (LRU cache)`
`259`	`223`	`start_time = time.time()`
`260`		`-`
`261`	`224`	`get_html_data_lru('https://books.toscrape.com/')`
`262`		`-`
`263`		`-print('Time taken (memoized function with LRU cache):', time.time() -`
`264`		`-start_time)`
	`225`	`+print('Time taken (memoized function with LRU cache):', time.time() - start_time)`
	`226`	+```
`265`	`227`
`266`	`228`	`This produced the following output:`
	`229`	`+![](images/output_normal_lru.png)`
`267`	`230`
`268`		`-### Performance comparison`
	`231`	`+### Performance comparison`
`269`	`232`
`270`	`233`	`In the following table, we’ve determined the execution times of all`
`271`	`234`	`three functions for different numbers of requests to these functions:`
`@@ -280,6 +243,7 @@ three functions for different numbers of requests to these functions:`
`280`	`243`	`As the number of requests to the functions increases, you can see a`
`281`	`244`	`significant reduction in execution times using the caching strategy. The`
`282`	`245`	`following comparison chart depicts these results:`
	`246`	`+![](images/comparison-chart.png)`
`283`	`247`
`284`	`248`	`The comparison results clearly show that using a caching strategy in`
`285`	`249`	`your code can significantly improve overall performance and speed.`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit acd4042

File tree

1 file changed

1 file changed

`‎readme.md‎`

0 commit comments