A program $P$ consists of two methods: one sequential, which takes $ n + 15050 $ seconds to execute, and another that can be parallelized with full efficiency, taking $\frac{n^2}{100p}$ seconds, where $p$ is the number of processors. What is the maximum speedup that can be achieved when the input is $ n = 1000$ ?
- Derive the general formula for any number of processors p.
- Calculate the speedup for exactly 2 processors.
- Determine the maximum speedup for an unbounded number of processors.
my attempt is the following:
Let's define $T_S(n)$ as the execution time of the first method (which runs sequentially) and $T_P(n, p)$ as the execution time of the second method (which runs in parallel). Note that the total execution time $T(n, p)$ of the program, given the two methods and $n = 1000$, can be expressed as:
$$ T(1000, p) = T_S(1000) + T_P(1000, p) $$
$$ = (1000 + 15050) + \frac{1000^2}{100p} $$
$$ = 16050 + \frac{10000}{p} $$
To obtain the maximum speedup, it is necessary to calculate the percentage of code that is executed in parallel. Observe that:
$$ \text{%}P = \frac{T_P}{T} = \frac{\frac{10000}{p}}{16050 + \frac{10000}{p}} = \frac{10000}{16050p + 10000} $$
Thus, we have:
$$ \text{%}S = 1 - \text{%}P = 1 - \frac{10000}{16050p + 10000} = \frac{16050p}{16050p + 10000} $$
On the other hand, we know that the maximum speedup is obtained with the formula:
$$ S_{\text{max}}(p) = \frac{1}{(1 - \text{%}P) + \frac{\text{%}P}{p}} $$
So, by substituting the previous results, we obtain that the maximum speedup for the program, given $n = 1000$ and $p$ processors, is:
$$ S_{\text{max}}(p) = \frac{1}{\frac{16050p}{16050p + 10000} + \frac{16050p^2 + 10000p}{(16050p + 10000)^2}} $$
However, I believe this formula might not be correct because it doesn't make sense for the limit to approach 1 in the last part. I would appreciate help in calculating the correct formula and understanding where my reasoning might have gone wrong.
2 Answers 2
By definition, the "speedup" with $p$ processors is how much faster the computation runs compared to just one: $$ S = \frac{T(1000, 1)}{T(1000, p)} = \frac{26050}{16050 + \frac{10000}{p}} $$ This is clearly an increasing function of $p$. Now, assume that the number of processors grows to $+\infty$. Then you get $$ S \leq \lim_{p \rightarrow +\infty} \frac{T(1000, 1)}{T(1000, p)} = \frac{26050}{16050} \leq 1.6231. $$
-
$\begingroup$ I think the answer would be more helpful is it showed where 10000ドル$ comes from and, if possible, 26050ドル$. (I'm afraid I don't understand the denominator at all: Whence 16050ドル$ , and why add it to "the parallel time"?) $\endgroup$greybeard– greybeard2024年08月19日 06:10:30 +00:00Commented Aug 19, 2024 at 6:10
-
$\begingroup$ It is written in the original question that the running time with $p$ processors is 16050ドル + 10000/p$. This is where these numbers come from. With $p=1,ドル this makes 26050. $\endgroup$Charles Bouillaguet– Charles Bouillaguet2024年09月02日 09:22:51 +00:00Commented Sep 2, 2024 at 9:22
-
$\begingroup$ I think it would be helpful to start at n²/100p. I thought the procedures were alternatives, mistakenly, I now think. $\endgroup$greybeard– greybeard2024年09月02日 12:01:03 +00:00Commented Sep 2, 2024 at 12:01
You take at least n + 15,050 seconds. During the first n + 15,050 seconds, you have p-1 processors for parallel work, after that you have p processors.
With one processor the time is n + 15,050 + n^2 / 100. The speed up is maximal when adding another processor doesnt help, and that is when p-1 processors do the job in n + 15,050 seconds, or n^2 / 100 / (p-1) <= n + 15,050 or p-1 >= n^2 / 100 / (n + 15,050) or p = ceil(1 + n^2 / 100 / (n + 15,050)).
With n = 1,000 this is ceil(1 + 10,000 / 16,050) = 2. With n = 10^6 it would be a bit less than 10,000. We get p = 3 around n = 1,280.
So with n = 1,000 and p = 2 you do 16,050 seconds of serial work, and the second processor spends 10,000 seconds on parallel work.