1- # ** Python Cache: How to Speed Up Your Code with Effective Caching**  
1+ # Python Cache: How to Speed Up Your Code with Effective Caching  
22
33This article will show you how to use caching in Python with your web
44scraping tasks. You can read the [ <u >full
55article</u >] ( https://oxylabs.io/blog/python-cache-how-to-use-effectively ) 
66on our blog, where we delve deeper into the different caching
77strategies.
88
9- ## ** How to implement a cache in Python**  
9+ ## How to implement a cache in Python  
1010
1111There are different ways to implement caching in Python for different
1212caching strategies. Here we’ll see two methods of Python caching for a
1313simple web scraping example. If you’re new to web scraping, take a look
1414at our [ <u >step-by-step Python web scraping
1515guide</u >] ( https://oxylabs.io/blog/python-web-scraping ) .
1616
17- ### ** Install the required libraries**  
17+ ### Install the required libraries  
1818
1919We’ll use the [ <u >requests
2020library</u >] ( https://pypi.org/project/requests/ )  to make HTTP requests
2121to a website. Install it with
2222[ <u >pip</u >] ( https://pypi.org/project/pip/ )  by entering the following
2323command in your terminal:
2424
25+ ``` bash 
2526python -m pip install requests
27+ ``` 
2628
27- Other libraries we’ll use in this project, specifically time and
28- functools, come natively with Python 3.11.2, so you don’t have to
29+ Other libraries we’ll use in this project, specifically ` time `  and
30+ ` functools ` , come natively with Python 3.11.2, so you don’t have to
2931install them.
3032
31- ### ** Method 1: Python caching using a manual decorator**  
33+ ### Method 1: Python caching using a manual decorator  
3234
3335A [ <u >decorator</u >] ( https://peps.python.org/pep-0318/ )  in Python is a
3436function that accepts another function as an argument and outputs a new
@@ -42,143 +44,116 @@ saving them in the cache for future use.
4244Let’s start by creating a simple function that takes a URL as a function
4345argument, requests that URL, and returns the response text:
4446
47+ ``` python 
4548def  get_html_data (url ):
46- 47- response = requests.get(url)
48- 49- return response.text
49+  response =  requests.get(url)
50+  return  response.text
51+ ``` 
5052
5153Now, let's move toward creating a memoized version of this function:
5254
55+ ``` python 
5356def  memoize (func ):
57+  cache =  {}
5458
55- cache = {}
56- 57- def wrapper(\* args):
58- 59- if args in cache:
60- 61- return cache\[ args\] 
62- 63- else:
64- 65- result = func(\* args)
59+  def  wrapper (* args ):
60+  if  args in  cache:
61+  return  cache[args]
62+  else :
63+  result =  func(* args)
64+  cache[args] =  result
65+  return  result
6666
67- cache \[ args \]  = result 
67+ return  wrapper 
6868
69- return result
70- 71- return wrapper
7269
7370@memoize 
74- 7571def  get_html_data_cached (url ):
72+  response =  requests.get(url)
73+  return  response.text
74+ ``` 
7675
77- response = requests.get(url)
78- 79- return response.text
80- 81- The wrapper function determines whether the current input arguments have
76+ The ` wrapper `  function determines whether the current input arguments have
8277been previously cached and, if so, returns the previously cached result.
8378If not, the code calls the original function and caches the result
84- before being returned. In this case, we define a memoize decorator that
85- generates a cache dictionary to hold the results of previous function
79+ before being returned. In this case, we define a ` memoize `  decorator that
80+ generates a ` cache `  dictionary to hold the results of previous function
8681calls.
8782
88- By adding @memoize   above the function definition, we can use the memoize
89- decorator to enhance the get_html_data function. This generates a new
90- memoized function that we’ve called get_html_data_cached. It only makes
83+ By adding ` @memoize `  above the function definition, we can use the memoize
84+ decorator to enhance the ` get_html_data `  function. This generates a new
85+ memoized function that we’ve called ` get_html_data_cached ` . It only makes
9186a single network request for a URL and then stores the response in the
9287cache for further requests.
9388
94- Let’s use the time module to compare the execution speeds of the
95- get_html_data function and the memoized get_html_data_cached function:
89+ Let’s use the ` time `  module to compare the execution speeds of the
90+ ` get_html_data `  function and the memoized ` get_html_data_cached `  function:
9691
92+ ``` python 
9793import  time
9894
99- start_time = time.time()
10095
96+ start_time =  time.time()
10197get_html_data(' https://books.toscrape.com/' 
102- 10398print (' Time taken (normal function):' -  start_time)
10499
105- start_time = time.time()
106100
101+ start_time =  time.time()
107102get_html_data_cached(' https://books.toscrape.com/' 
108- 109- print('Time taken (memoized function using manual decorator):',
110- time.time() - start_time)
103+ print (' Time taken (memoized function using manual decorator):' -  start_time)
104+ ``` 
111105
112106Here’s what the complete code looks like:
113107
114- \#  Import the required modules 
115- 108+ ``` python 
109+ #  Import the required modules 
116110from  functools import  lru_cache
117- 118111import  time
119- 120112import  requests
121113
122- \#  Function to get the HTML Content
123114
115+ #  Function to get the HTML Content
124116def  get_html_data (url ):
117+  response =  requests.get(url)
118+  return  response.text
125119
126- response = requests.get(url)
127- 128- return response.text
129- 130- \#  Memoize function to cache the data
131120
121+ #  Memoize function to cache the data
132122def  memoize (func ):
123+  cache =  {}
133124
134- cache = {}
135- 136- \#  Inner wrapper function to store the data in the cache
137- 138- def wrapper(\* args):
139- 140- if args in cache:
141- 142- return cache\[ args\] 
143- 144- else:
145- 146- result = func(\* args)
125+  #  Inner wrapper function to store the data in the cache
126+  def  wrapper (* args ):
127+  if  args in  cache:
128+  return  cache[args]
129+  else :
130+  result =  func(* args)
131+  cache[args] =  result
132+  return  result
147133
148- cache \[ args \]  = result 
134+ return  wrapper 
149135
150- return result
151- 152- return wrapper
153- 154- \#  Memoized function to get the HTML Content
155136
137+ #  Memoized function to get the HTML Content
156138@memoize 
157- 158139def  get_html_data_cached (url ):
140+  response =  requests.get(url)
141+  return  response.text
159142
160- response = requests.get(url)
161- 162- return response.text
163- 164- \#  Get the time it took for a normal function
165143
144+ #  Get the time it took for a normal function
166145start_time =  time.time()
167- 168146get_html_data(' https://books.toscrape.com/' 
169- 170147print (' Time taken (normal function):' -  start_time)
171148
172- \#  Get the time it took for a memoized function (manual decorator)
173- 149+ #  Get the time it took for a memoized function (manual decorator)
174150start_time =  time.time()
175- 176151get_html_data_cached(' https://books.toscrape.com/' 
152+ print (' Time taken (memoized function using manual decorator):' -  start_time)
153+ ``` 
177154
178- print('Time taken (memoized function using manual decorator):',
179- time.time() - start_time)
180- 181- Here’s the output:
155+ And here’s the output:
156+ ![ ] ( images/output_normal_memoized.png ) 
182157
183158Notice the time difference between the two functions. Both take almost
184159the same time, but the supremacy of caching lies behind the re-access.
@@ -190,82 +165,70 @@ increase the number of calls to these functions, the time difference
190165will significantly increase (see [ <u >Performance
191166Comparison</u >] ( #performance-comparison ) ). 
192167
193- ### ** Method 2: Python caching using LRU cache decorator**  
168+ ### Method 2: Python caching using LRU cache decorator  
194169
195170Another method to implement caching in Python is to use the built-in
196- @lru_cache decorator from functools. This decorator implements cache
171+ ` @lru_cache `  decorator from ` functools ` . This decorator implements cache
197172using the least recently used (LRU) caching strategy. This LRU cache is
198173a fixed-size cache, which means it’ll discard the data from the cache
199174that hasn’t been used recently.
200175
201- To use the @lru_cache decorator, we can create a new function for
176+ To use the ` @lru_cache `  decorator, we can create a new function for
202177extracting HTML content and place the decorator name at the top. Make
203- sure to import the functools module before using the decorator: 
178+ sure to import the ` functools `  module before using the decorator: 
204179
180+ ``` python 
205181from  functools import  lru_cache
206182
207- @lru_cache(maxsize=None)
208183
184+ @lru_cache (maxsize = None )
209185def  get_html_data_lru (url ):
186+  response =  requests.get(url)
187+  return  response.text
188+ ``` 
210189
211- response = requests.get(url)
212- 213- return response.text
214- 215- In the above example, the get_html_data_lru method is memoized using the
216- @lru_cache decorator. The cache can grow indefinitely when the maxsize
217- option is set to None.
190+ In the above example, the ` get_html_data_lru `  method is memoized using the
191+ ` @lru_cache `  decorator. The cache can grow indefinitely when the ` maxsize ` 
192+ option is set to ` None ` .
218193
219- To use the @lru_cache decorator, just add it above the get_html_data_lru
194+ To use the ` @lru_cache `  decorator, just add it above the ` get_html_data_lru ` 
220195function. Here’s the complete code sample:
221196
222- \#  Import the required modules 
223- 197+ ``` python 
198+ #  Import the required modules 
224199from  functools import  lru_cache
225- 226200import  time
227- 228201import  requests
229202
230- \#  Function for getting HTML Content
231203
204+ #  Function to get the HTML Content
232205def  get_html_data (url ):
206+  response =  requests.get(url)
207+  return  response.text
233208
234- response = requests.get(url)
235- 236- return response.text
237- 238- \#  Memoized using LRU Cache
239209
210+ #  Memoized using LRU Cache
240211@lru_cache (maxsize = None )
241- 242212def  get_html_data_lru (url ):
213+  response =  requests.get(url)
214+  return  response.text
243215
244- response = requests.get(url)
245- 246- return response.text
247- 248- \#  Getting time for Normal function to extract HTML content
249216
217+ #  Get the time it took for a normal function
250218start_time =  time.time()
251- 252219get_html_data(' https://books.toscrape.com/' 
253- 254220print (' Time taken (normal function):' -  start_time)
255221
256- \#  Getting time for Memoized function (LRU cache) to extract HTML
257- content
258- 222+ #  Get the time it took for a memoized function (LRU cache)
259223start_time =  time.time()
260- 261224get_html_data_lru(' https://books.toscrape.com/' 
262- 263- print('Time taken (memoized function with LRU cache):', time.time() -
264- start_time)
225+ print (' Time taken (memoized function with LRU cache):' -  start_time)
226+ ``` 
265227
266228This produced the following output:
229+ ![ ] ( images/output_normal_lru.png ) 
267230
268- ### ** Performance comparison**  
231+ ### Performance comparison  
269232
270233In the following table, we’ve determined the execution times of all
271234three functions for different numbers of requests to these functions:
@@ -280,6 +243,7 @@ three functions for different numbers of requests to these functions:
280243As the number of requests to the functions increases, you can see a
281244significant reduction in execution times using the caching strategy. The
282245following comparison chart depicts these results:
246+ ![ ] ( images/comparison-chart.png ) 
283247
284248The comparison results clearly show that using a caching strategy in
285249your code can significantly improve overall performance and speed.
0 commit comments