I have been coding in Python for a number of years now. I've always felt that Matplotlib code takes up a lot more lines of code than it should. I could be wrong.
I have the following function that plots a simple scatter plot graph with two additional solid lines. Is there any way for me to reduce the number of lines to achieve exactly the same outcome? I feel that my code is a little 'chunky'.
dates
contains an array of DateTime values in the yyyy-mm-dd H-M-S format
return_values
- array of floats
main_label
- string
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
%matplotlib inline
plt.style.use('ggplot')
def plot_returns(dates, return_values, ewma_values, main_label):
plt.figure(figsize=(15, 10))
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=31))
plt.scatter(dates, return_values, linestyle = '-', color='blue', s = 3, label = "Daily Returns")
plt.plot(dates, ewma_values, linestyle = '-', color='green', label = "EWMA")
plt.gcf().autofmt_xdate()
plt.xlabel('Day', fontsize = 14)
plt.ylabel('Return', fontsize = 14)
plt.title(main_label, fontsize=14)
plt.legend(loc='upper right', facecolor='white', edgecolor='black', framealpha=1, ncol=1, fontsize=12)
plt.xlim(left = min(dates))
plt.show()
dates = pd.date_range(start = '1/1/2018', end = '10/10/2018')
return_values = np.random.random(len(dates))
ewma_values = 0.5 + np.random.random(len(dates))*0.1
plot_returns(dates, return_values, ewma_values, "Example")
-
1\$\begingroup\$ It might be a good idea to include a little more code so people can run an analysis in their IDE. They should be able to reproduce what you're seeing. \$\endgroup\$C. Harley– C. Harley2021年09月12日 17:32:28 +00:00Commented Sep 12, 2021 at 17:32
-
\$\begingroup\$ Thanks. I've added more code. \$\endgroup\$Ruan– Ruan2021年09月13日 19:32:53 +00:00Commented Sep 13, 2021 at 19:32
-
\$\begingroup\$ Thanks, that runs. So, each of your statements does something specific to get the graph to draw. I don't find it too verbose compared to the example page matplotlib.org/2.0.2/users/screenshots.html given that you're drawing two graphs (scatter and line) on one plot. Is there something specific you want to highlight? \$\endgroup\$C. Harley– C. Harley2021年09月13日 22:55:29 +00:00Commented Sep 13, 2021 at 22:55
1 Answer 1
Is there any way for me to reduce the number of lines to achieve exactly the same outcome?
should, in isolation, not be your overriding concern, and your code is about as minimally chunky as matplotlib will allow. Your current push - rather than to shed a line or two - should be to increase static testability, maintainability and structure. Said another way, this is not code golf, and not all short code is good code.
To that end:
- Do not enforce a style in the global namespace - only call that from a routine in the application. What if someone else wants to import and reuse parts of your code?
- Add PEP484 type hints.
- Avoid calling
gca
andgcf
. It's easy, and preferable, to have local references to your actual figure and axes upon creation, and to use methods bound to those specific objects instead ofplt
. Function calls viaplt
have more visual ambiguity, and need to infer the current figure and axes; being explicit is a better idea. On top of that, calls toplt
are just wrappers to the bound instance methods anyway. - Choose a consistent quote style.
black
prefers double quotes but I have a vendetta againstblack
and personally prefer single quotes. It's up to you. - Do not force a
show()
in the call toplot_returns
, and return the generatedFigure
instead ofNone
. This will improve reusability and testability. - Do not use strings for internal date logic. Even if you had to use strings, prefer an unambiguous
YYYY-mm-dd
ISO format instead of yours. np.random.random
is deprecated; usedefault_rng()
instead.
Suggested
from datetime import date
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.figure import Figure
def plot_returns(
dates: pd.DatetimeIndex,
return_values: np.ndarray,
ewma_values: np.ndarray,
main_label: str,
) -> Figure:
fig, ax = plt.subplots(figsize=(15, 10))
ax.scatter(dates, return_values, linestyle='-', color='blue', s=3, label='Daily Returns')
ax.plot(dates, ewma_values, linestyle='-', color='green', label='EWMA')
fig.autofmt_xdate()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.xaxis.set_major_locator(mdates.DayLocator(interval=31))
ax.set_xlabel('Day', fontsize=14)
ax.set_ylabel('Return', fontsize=14)
ax.set_title(main_label, fontsize=14)
ax.legend(loc='upper right', facecolor='white', edgecolor='black', framealpha=1, ncol=1, fontsize=12)
ax.set_xlim(left=min(dates))
return fig
def main() -> None:
dates = pd.date_range(start=date(2018, 1, 1), end=date(2018, 10, 10))
rand = np.random.default_rng()
return_values = rand.random(len(dates))
ewma_values = rand.uniform(low=0.5, high=0.6, size=len(dates))
plt.style.use('ggplot')
plot_returns(dates, return_values, ewma_values, 'Example')
plt.show()
if __name__ == '__main__':
main()