0

(Edited)

I'm trying to parallelize a function to solve a LCS, via the usage of a Parallel.For loop. This function proceeds by diagonals, obtaining the value of the current diagonal cell based on the previous ones.

The sequential function is as follows:

let public lcs_seq_1d_diags (x:char[]) (y:char[]) = 
 let m = x.Length
 let n = y.Length
 let mutable dk2 = Array.create (1+m) 0
 let mutable dk1 = Array.create (1+m) 0
 let mutable dk = Array.create (1+m) 0
 for k = 2 to m+n do
 let low = max 1 (k-m)
 let high = min (k-1) n
 for j = low to high do
 let i = k - j
 if x.[i-1] = y.[j-1] then
 dk.[i] <- dk2.[i-1] + 1
 else 
 dk.[i] <- max dk1.[i] dk1.[i-1]
 let mutable temp = dk2
 dk2 <- dk1
 dk1 <- dk
 dk <- temp
 dk1.[m]

My attempt at parallelization:

let public lcs_par_1d_seq (x:char[]) (y:char[]) = 
 let m = x.Length
 let n = y.Length
 let dk2 = Array.create (1+m) 0
 let cell2 = ref dk2
 printfn "\r\n0: %A" dk2
 let dk1 = Array.create (1+m) 0
 let cell1 = ref dk1
 printfn "1: %A" dk1
 let dk = Array.create (1+m) 0
 let cell = ref dk
 for k = 2 to m+n do
 let low = max 1 (k-m)
 let high = min (k-1) n
 //for each cell in current diagonal
 Parallel.For(low, high, (fun j ->
 let i = k - j
 if x.[i-1] = y.[j-1] then
 dk.[i] <- dk2.[i-1] + 1
 else 
 dk.[i] <- max dk1.[i] dk1.[i-1])) |> ignore
 if trace then 
 let ilow = k - high
 let ihigh = k - low
 printfn "k=%d i=%d..%d dk=%A" k ilow ihigh dk.[ilow..ihigh]
 let mutable temp = !cell2
 cell2 := !cell1
 cell1 := !cell
 cell := temp
 dk1.[m]

Trace results for the sequential loop:

0: [|0; 0; 0; 0; 0; 0; 0|]
1: [|0; 0; 0; 0; 0; 0; 0|]
k=2 i=1..1 dk=[|0|]
k=3 i=1..2 dk=[|0; 0|]
k=4 i=1..3 dk=[|1; 1; 1|]
k=5 i=1..4 dk=[|1; 1; 1; 1|]
k=6 i=1..5 dk=[|1; 1; 1; 1; 1|]
k=7 i=1..6 dk=[|1; 2; 2; 1; 2; 1|]
k=8 i=1..6 dk=[|1; 2; 2; 2; 2; 2|]
k=9 i=1..6 dk=[|1; 2; 2; 2; 2; 2|]
k=10 i=1..6 dk=[|1; 2; 2; 2; 3; 2|]
k=11 i=2..6 dk=[|2; 2; 2; 3; 3|]
k=12 i=3..6 dk=[|2; 2; 3; 3|]
k=13 i=4..6 dk=[|2; 3; 3|]
k=14 i=5..6 dk=[|3; 4|]
k=15 i=6..6 dk=[|4|]
... duration: 19 ms
res_seq_1d_diags = 4
Press any key to continue . . .

Trace results for the parallel loop:

0: [|0; 0; 0; 0; 0; 0; 0|]
1: [|0; 0; 0; 0; 0; 0; 0|]
k=2 i=1..1 dk=[|0|]
k=3 i=1..2 dk=[|0; 0|]
k=4 i=1..3 dk=[|0; 1; 1|]
k=5 i=1..4 dk=[|0; 0; 0; 0|]
k=6 i=1..5 dk=[|0; 0; 0; 0; 0|]
k=7 i=1..6 dk=[|0; 1; 1; 0; 1; 0|]
k=8 i=1..6 dk=[|0; 0; 0; 0; 0; 1|]
k=9 i=1..6 dk=[|0; 0; 0; 0; 0; 0|]
k=10 i=1..6 dk=[|0; 1; 0; 0; 1; 0|]
k=11 i=2..6 dk=[|1; 0; 0; 0; 1|]
k=12 i=3..6 dk=[|0; 0; 0; 0|]
k=13 i=4..6 dk=[|0; 1; 0|]
k=14 i=5..6 dk=[|1; 1|]
k=15 i=6..6 dk=[|1|]
... duration: 23 ms
res_par_1d_seq = 0
Press any key to continue . . .

The code that produces the trace is:

 if trace then 
 let ilow = k - high
 let ihigh = k - low
 printfn "k=%d i=%d..%d dk=%A" k ilow ihigh dk.[ilow..ihigh]

'dk' here refers to the values of the cells in each matrix diagonal (ie the first diagonal has 1 cell with the values 0, the 2nd diagonal has 2 cells with the values 0 and 0, etc).

Basically, in the parallel version, some of the threads seem to be overriding each others' values. Any suggestions how I can avoid this and get the values to save properly in the parallel version?

asked Apr 24, 2013 at 10:37
4
  • 2
    I think you need some significant rewriting of the loop to make it work with parallel. It appears to have data dependencies on the previous iteration so cannot be parallelised at all. Commented Apr 24, 2013 at 10:42
  • Thanks, John. Which data dependencies do you see? Principally this algorithm is intended to work along the 'diagonals' of the LCS matrix, so each diagonal should not affect the other as it is in a different row and column. Commented Apr 24, 2013 at 10:49
  • 1
    I think it is possible to reach the same value of i multiple times in the loop which will result in entries in dk being overwritten in multiple threads. There may be others as well but that is the most obvious. Commented Apr 24, 2013 at 11:01
  • Ack, thanks, I realized that I was parallelizing the wrong loop too. I should be parallelizing the inner loop instead, which produces a much less nonsensical trace, although the entries still seem to be rewritten. Should I edit my opening post and title to reflect this, or start a whole new thread? Commented Apr 24, 2013 at 12:47

3 Answers 3

1

First, since your code doesn't match your output (the length of dk doesn't ever change in your code), I'm not going to try to understand what your code actually does.

Now, your issues has nothing to do with ref cells. The problem is that the value of dk in the current iteration depends on the value of dk1 and dk2 of the previous iteration and on proper rotation of those variables in each iteration. This means your code is not parallelizable, at least not directly.

answered Apr 24, 2013 at 11:47
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help, both of you, I see that now. I had intended to use the dk1 and dk2 arrays to store the values of the previous iteration because it was needed to keep a tally of the LCS 'score' Could you suggest some other ways of circumventing this problem in a parallelizable way, please? I have tried reading up everything that I can find online about parallelization in F# and the TPL but I have not found any in-depth explanation about how to refactor code to make it parallelizable. I will also edit my question with the code and explanation for the trace, thanks for pointing that out.
There are some excellent books on parallelism in .Net. Everything of relevance to in C# also applies to F#. Try this one if you haven't already: amazon.com/gp/product/0735651590/…
0

For parallelizable code, you need to think first of a way to write the code so that each loop is independent of the other. i.e., if your value at i depends on the value at i-1, then the loop is not parallelizable, as you don't know whether i or i-1 will be calculated first.

In your case, probably do something like calculate the LCS for each diagonal i and store that in lcs[i]. That result should be independent of anything located in other diagonals, and thus parallelizable. Then after that loop, you can then return lcs.Max().

answered Apr 24, 2013 at 16:08

Comments

0

I've found the issue - it was that I was not using my refs inside the Parallel loop. It works as intended now. I still much appreciate the help, and have also picked up the books suggested.

Thanks guys.

answered Apr 25, 2013 at 21:48

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.