I implement sum element of array using multi threading but my question is when my program work with one thread the time is better than using 6 thread . I am using a CPU with 8 threads.What am I wrong
public static int threadss = 7;
public const int count = 1000000;
public static List<int> a = new List<int>();
public static int numberofthreads;
public static long[] sum=new long[threadss];
public static long total_sum=0;
public static long part = 0;
static void Main(string[] args)
{
Random r = new Random();
for (int i = 1; i <= count; i++)
{
a.Add(i);
}
numberofthreads = threadss;
var watch = new System.Diagnostics.Stopwatch();
watch.Start();
Thread[] threadsarray = new Thread[numberofthreads];
List<Thread> threads = new List<Thread>();
for (int i = 0; i < numberofthreads; i++)
{
threadsarray[i] = new Thread(new ParameterizedThreadStart(myThreadMethod));
threadsarray[i].Start(i);
}
for (int i = 0; i < numberofthreads; i++)
{
threadsarray[i].Join();
}
for(int i= 0; i < numberofthreads; i++)
{
total_sum += sum[i];
}
watch.Stop();
Console.WriteLine("Time is " + watch.ElapsedMilliseconds.ToString());
Console.WriteLine("sum is "+ total_sum);
}
static void myThreadMethod(object threadid)
{
int thid = (int)threadid;
for (int i = (thid * (a.Count / numberofthreads));i < (thid + 1) * a.Count / numberofthreads; i++)
{
sum[thid] += a[i];
}
}
2 Answers 2
Running your code in Release build after a tweak like i < count
instead of i <= count
and increased array size by 10 times, and assign 1 for each array element (needed for further comparison):
Time is 177
sum is 10000000
Then I've made some changes
public const int numberofthreads = 7;
public const int count = 10000000;
static void Main(string[] args)
{
int[] a = new int[count];
for (int i = 0; i < count; i++)
{
a[i] = 1;
}
var watch = new Stopwatch();
watch.Start();
long sum1 = ThreadSum(a);
watch.Stop();
Console.WriteLine("Time is " + watch.ElapsedMilliseconds);
Console.WriteLine("sum is " + sum1);
Console.ReadKey();
}
static long ThreadSum(int[] array)
{
long total_sum = 0;
long[] sum = new long[numberofthreads];
Thread[] threadsarray = new Thread[numberofthreads];
for (int i = 0; i < numberofthreads; i++)
{
int tmp = i;
threadsarray[i] = new Thread(() => MyThreadMethod(tmp, array, sum));
threadsarray[i].Start();
}
for (int i = 0; i < numberofthreads; i++)
{
threadsarray[i].Join();
}
for (int i = 0; i < numberofthreads; i++)
{
total_sum += sum[i];
}
return total_sum;
}
static void MyThreadMethod(int threadid, int[] array, long[] sum)
{
int thid = threadid;
for (int i = thid * (array.Length / numberofthreads); i < (thid + 1) * array.Length / numberofthreads; i++)
{
sum[thid] += array[i];
}
}
Time is 6
sum is 10000000
Interesting. That's mostly faster because I'm using Array
instead of List
for the source data. Let's make some more optimisations like using pooled threads.
static long ThreadSum(int[] array)
{
long total_sum = 0;
long[] sum = new long[numberofthreads];
Task[] tasks = new Task[numberofthreads];
for (int i = 0; i < numberofthreads; i++)
{
int tmp = i;
tasks[i] = Task.Run(() => MyThreadMethod(tmp, array, sum));
}
Task.WaitAll(tasks);
for (int i = 0; i < numberofthreads; i++)
{
total_sum += sum[i];
}
return total_sum;
}
No progress
Time is 6
sum is 10000000
...because internal threading overhead here is the same. But in wide thread use i recommend Task.Run
instead of new Thread
.
Ok, let's do that in the old way
static long Sum(int[] array)
{
long sum = 0;
for (int i = 1; i < count; i++)
{
sum += array[i];
}
return sum;
}
Time is 5
sum is 10000000
One thread is almost faster than 7 threads. :) That's because of the same threading overhead. Creating thread is expensive operation.
But what about SIMD?
static long SimdSum(int[] array)
{
Vector<int> sumVector = Vector<int>.Zero;
ReadOnlySpan<Vector<int>> vectors = MemoryMarshal.Cast<int, Vector<int>>(array);
for (int i = 0; i < vectors.Length; i++)
{
sumVector += vectors[i];
}
long sum = Vector.Dot(sumVector, Vector<int>.One);
for (int i = array.Length - array.Length % Vector<int>.Count; i < array.Length; i++)
{
sum += array[i];
}
return sum;
}
My Vector<int>
length is 8, then it works like 8 threads but on one thread because SIMD instruction calculates 8 int
s at once.
Time is 2
sum is 10000000
Looks like it the fastest for now. Probably there's more ways to optimize that but I think, that's enough. The main issue that threading here isn't effective because computation time for the each Thread is too small.
Also, I suggest to say hello to Benchmark.NET, it can measure methods performance more accurately.
The computations you are doing are absolutely trivial. Summing integers is pretty much one of the fastest operations a CPU can do.
In particular, the computation is much faster than the time it takes to even spin up a single thread, let alone spin up six of them – and that's not even talking about merging back the results and destroying the threads.
The overhead for parallelism absolutely dwarfs the computation in your case.
sum
, to do that. This is an operation you’d normally want to vectorize. \$\endgroup\$