Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Q: Most performant way to update an entire column or row to a scalar value? #145

Unanswered
polarathene asked this question in Q&A
Discussion options

TL;DR: How to efficiently update a large column or row with a single value many times without creating similar sized arrays that use up GPU memory?


Example size and two approaches I can think of

If I have an array with 300 million rows and 5 columns, and I want to update all elements in a column to a shared value multiple times, what is the best way to go about this?

  • set_col() with a constant array with matching row size for each value I want to replace the column with. This sounds like it would waste vRAM if I have a lot of these.

  • replace_scalar() which would work but require an array of [300mil, 5, 1,1] dims where one column is 0 and the rest 1?( Requiring either 5 of these to target each column or updating that cond array with set_col() and two [300mil, 1, 1, 1] const arrays with values 0 or 1).


How are you trying to apply this?

I am building up string permutations with tile() and flat() up to a certain size that the GPU memory allows. This has proven much faster than on CPU now that I understand how to create it effectively on GPU with ArrayFire :) When tiling needs to stop, I have all permuations for that string length. In order to do longer lengths, I am then batching the permutations, processing one batch at a time.

All permutations of "aaa" to "zzz", 26^3(17,576), dims of [1,3,1,1] (I'm not sure if I should prefer columns or rows for this)
Array contains all values ranging:

[97, 97, 97] // [b'a', b'a', b'a'] 
to
[122, 122, 122] // [b'z', b'z', b'z']

If the batching were to start at length of 4, I would then add a column with 26^3 rows all set to "a"(0x97 ASCII byte). I would then update all values in this column to get "b" and so forth, processing each array "batch" with the rest of the ArrayFire logic I have.

Array contains all values ranging:

[97, 97, 97, 97] // "aaaa"
to
[97, 122, 122, 122] // "azzz"
// How to change 1st column to another value, 98 or 107? without wasting lots of GPU memory

Why not just add the column value by x?

In this example you would say that the column could just use addition of 1, which makes more sense and should be more performant than above two approaches. This could work best perhaps by extracting the col from array, adding against a constant array of same size with value 1, then set_col back to original, followed by repeating last two steps until 0x122/"z". When values do not increment by 1, this does not work as well?

I want to support custom charsets(which might apply to only specific columns not all), not just "a" to "z", I've not seen a way with ArrayFire to set all values of an arrays column to a value or index of another array(where I could just iterate/loop through another AF array or CPU arrays index for values.


Should I raise feature request on main repo?

If there is not a good way to approach this currently with AF, I could go to the main repo and raise a feature request? I will be contributing this part of my project as an example to the repo in future once I have it in a good working state :)

You must be logged in to vote

Replies: 8 comments

Comment options

Can you provide a simple input and output ? Instead of 300 million rows may be chose 10 ? I am trying to understand if there is existing functions to do this efficiently.

You must be logged in to vote
0 replies
Comment options

If I understand this correctly there may be a really fast way to do this. I don't know how you would do this in rust, but the fastest (and least memory used way) to do this would be to do the following in C++.

array a; // original [300E6, 5] array
array x; // values to be replaced, size [1, 5]
array y; // values replacing x, size [1, 5]
array x_tiled = tile(x, 300E6); // does not actually allocate memory
array y_tiled = tile(y, 300E6); // does not actually allocate memory
array cond = (a == x_tiled); // does not allocate memory
array out = a * (1 - cond) + cond * y_tiled;
out.eval(); // creates memory for output

If the following issue goes through, you can achieve same behavior using out = select(a, a != x_tiled, y_tiled); arrayfire/arrayfire#1345

But if you don't want to use an additional output buffer, raise an issue upstream to add this behavior to replace.

You must be logged in to vote
0 replies
Comment options

@pavanky Here is an example based on my existing code, sorry about the wait. With the given result at the end, I would say add a col of the same row height, dims just add a dimension via join. But to batch instead of create all 3 tiles/permutations for 3^4, I'd only be updating that new column(1st column when merged) to be a(97), then process, then change to b(98), and so forth. If the characters to be permutated were not "abc" but "qa7" this isn't as simple as adding by 1.

fn get_permutations_col(input_cols: &af::Array, new_col: &af::Array) -> af::Array {
 let rows_a = input_cols.dims().get()[0];
 let rows_b = new_col.elements() as u64;
 // Repeat the existing cols rows, repeat the new col row to match existing cols(then flatten into single col)
 // eg With charset "abc"(97, 98, 99),
 // input_cols is len 2(3^2) + 1 more col == 3^3 in rows for final dims
 let right_cols = &tile( input_cols, Dim4::new(&[rows_b, 1, 1, 1]) );
 let left_col = &flat(&tile( new_col, Dim4::new(&[rows_a, 1, 1, 1]) ));
 // merge new_cols to be the first col on the left
 join(1, left_col, right_cols)
}
fn generate_permutations_abc() {
 let range_a: Vec<u8> = (b'a'..b'c'+1).collect();
 let range_b: Vec<u8> = (b'a'..b'c'+1).collect();
 let dims_1 = Dim4::new(&[3, 1, 1, 1]); // first col
 let dims_n = Dim4::new(&[1, 3, 1, 1]); // additional cols
 let mut range_a_af = af::Array::new(&range_a, dims_1);
 let range_b_af = af::Array::new(&range_b, dims_n);
 af_print!("range_a_af:", range_a_af);
 // [3 1 1 1]
 // 97 
 // 98 
 // 99 
 
 af_print!("range_b_af:", range_b_af);
 // [1 3 1 1]
 // 97 98 99 
 range_a_af = get_permutations_col(&range_a_af, &range_b_af);
 range_a_af = get_permutations_col(&range_a_af, &range_b_af);
 af_print!("3^3 == 27 permutations aaa(97,97,97) -> ccc(99,99,99): ", range_a_af);
 // [27 3 1 1]
 // 97 97 97 
 // 97 97 98 
 // 97 97 99 
 // 97 98 97 
 // 97 98 98 
 // 97 98 99 
 // 97 99 97 
 // 97 99 98 
 // 97 99 99 
 // 98 97 97 
 // 98 97 98 
 // 98 97 99 
 // 98 98 97 
 // 98 98 98 
 // 98 98 99 
 // 98 99 97 
 // 98 99 98 
 // 98 99 99 
 // 99 97 97 
 // 99 97 98 
 // 99 97 99 
 // 99 98 97 
 // 99 98 98 
 // 99 98 99 
 // 99 99 97 
 // 99 99 98 
 // 99 99 99 
}
You must be logged in to vote
0 replies
Comment options

@pavanky I am using pretty much the same code as above to generate strings, and then my other ArrayFire logic to use this large array for computation. I've tiled it to increase the rows to 300 mil from 26^5("a" to "z" with 5 columns permutates to about 11mil rows).

When around the 300 mil mark I've noticed GPU memory is around 6-7GB, some of that may be due to other ArrayFire logic being affected by the size, I also have a an array generated on the host that does a range to provide index values multipled against eq() result. Removing the 0's will provide me with the indices I'm interested in(though this process is slow on such a large array with current AF methods). I will convert the host created array containing index values to the AF range() array method which should do the same and see if that helps.

You must be logged in to vote
0 replies
Comment options

@pavanky When you refer to tile not allocating memory, is this a better option than using constant? At least in this case for creating the boolean cond array for targeting a single column.

You must be logged in to vote
0 replies
Comment options

I want an example to show where you are exactly replacing the values. The example doesnt show that.

You must be logged in to vote
0 replies
Comment options

@pavanky With the example set, replace any column with a new value? Middle column to 0x100 for example. In my case it will be the left most column(0), if 3 columns of permutations (27x3) was the most my vRAM could support, and I required a length of 5(columns) then I would:

  • join() to the left side 2 new columns(all value "a"), process this array.
  • cycle through the given alphabet/charset("abc"/97,97,99) three times updating column(1) and processing the array each time
  • increment col(0)(which should have values of "a"/97 changing to "b"/98, no process, wait until col(1) is reset to "a"
  • col(1) then cycles through "abc" again, processing array with each column update
  • col(0) is set to "c"/99, and col(1) repeats another cycle, processing each col(1) update

All permutations for this keyspace 3^5 has been completed/batched. If the processing of the array batch found all matches before completing, the full set of permutations do not need to finish.


Would you like code example of this with set_col() or replace()?

You must be logged in to vote
0 replies
Comment options

Closing due to inactivity. Please reopen if the question is still pertinent.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Converted from issue

This discussion was converted from issue #145 on December 09, 2020 05:14.

AltStyle によって変換されたページ (->オリジナル) /