Finding maximum distribution between multiple columns

Question 1

I have an attribute table that shows how many buildings were built in what decade. The columns show the percentages for every 10 year (1939, 1940, 1950, 1960, 1970, 1980, 1990,2000, 2010) step for every polygon.

This is what it looks like:

example

Now I need to find out in which of the columns the maximum distribution lies. In the end, I'm supposed to have a choropleth map that shows the main decade it was built in. Does that make any sense?

I've already tried working with the styles but for the categorized settings you can always just choose one column but I need 9. Any tips?

Question 2

Have you had a look at the max-function in the field calculator (which can also be use as a base for classification)?

Question 3

Another method is to add a new column, "Year Constructed," using a conditional expression like this:

Case 
 when "P1939" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1939
 when "P1940" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1940
 when "P1950" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1950
 when "P1960" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1960
 when "P1970" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1970
 when "P1980" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1980
 when "P1990" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1990
 when "P2000" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 2000
 when "P2010" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 2010
 else 0
end

Apply a categorized style using the new "Year Constructed" column.

Question 4

Here's an alternative method using rule-based styling.

Set up a graduated style with 9 classes with your desired color ramp. Choose the color ramp carefully, because if you want won't be able to change this later. It doesn't matter what field or expression you use for "column", because you're going to change that in step 3. The entire point of this step is to establish the color gradient.

enter image description here
Change the style from "graduated" to "rule-based." The 9 graduated classes will be automatically converted to rules.
Change the label of the first rule to "Constructed in 1939", and change the filter expression to this:
```
"P1939" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010")
```
Repeat for all the rules, changing "P1939" to "P1940," "P1950," and so on.

enter image description here

Note: There's no way to apply a color ramp to an existing set of rules. If you want to change the symbol colors later, you have to change the color for each rule, one at at a time.

Question 5

Thanks a lot! This was an easy and quick solution to my problem!

Question 6

This is possible, but the expression needed to compute it is a bit convoluted. I will explain below.

Your task is to find the column name which has the highest value for every feature. If the highest value is in P1940 column, your answer is 1940, if the highest value is in P1980, your answer is 1980 and so on.

You can do this using array functions in QGIS. We are using array_sort() function here, which is available in QGIS 3.6 onwards only.

Here's the algorithm

Create an array of all the values from relevant columns.

array("P1940", "P1950", "P1960", "P1970", "P1980", "P2000", "P2010")

This will get us an array like [0, 10, 20, 40, 10, 0, 0]

Sort the array and find the last value. This will be the highest value.

array_last(array_sort(array("P1940", "P1950", "P1960", "P1970", "P1980", "P2000", "P2010")))

This will be the value 40.
Lookup the index of this highest value in the original array.

array_find(array("P1940", "P1950", "P1960", "P1970", "P1980", "P2000", "P2010"), array_last(array_sort(array("P1940", "P1950", "P1960", "P1970", "P1980", "P2000", "P2010"))))

This will be 3. As index counting starts from 0, and the highest value is in the 4th place, so we get 3.
Lookup the index from a list of years in the same order as the original array.

array_get(array(1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010), array_find(array("P1940", "P1950", "P1960", "P1970", "P1980", "P2000", "P2010"), array_last(array_sort(array("P1940", "P1950", "P1960", "P1970", "P1980", "P2000", "P2010")))))

This will be the value 1970.

The expression looks complex as we are repeating a lot of code as we can declare and use variables.

In your case, just create a new virtual field with the expression in Step 4 and you should see the correct values for each row that you can use for styling.

Question 7

Hi using a tool like Miller https://github.com/johnkerl/miller starting from this example input file

p1940,p1950,p1960,p1970
436,490,446,195
526,320,963,780
220,888,705,831

and running

mlr --csv merge-fields -a max -r "^[a-z]" -o value -k then put '
 for (key, value in $*) {
 if (value == $value_max && key != "value_max") {
 $fieldName=key;
 $valueField=gsub(key,"p","")
 }
}
' input.csv

You will have in output the max value by row, the field name in which you have the max by row, the value of the field name in which you have the max by row:

p1940,p1950,p1960,p1970,value_max,fieldName,valueField
436,490,446,195,490,p1950,1950
526,320,963,780,963,p1960,1960
220,888,705,831,888,p1950,1950

This is a pretty print version

+-------+-------+-------+-------+-----------+-----------+------------+
| p1940 | p1950 | p1960 | p1970 | value_max | fieldName | valueField |
+-------+-------+-------+-------+-----------+-----------+------------+
| 436 | 490 | 446 | 195 | 490 | p1950 | 1950 |
| 526 | 320 | 963 | 780 | 963 | p1960 | 1960 |
| 220 | 888 | 705 | 831 | 888 | p1950 | 1950 |
+-------+-------+-------+-------+-----------+-----------+------------+

Could it be useful to your goal?

csk csk 25.4k3 gold badges36 silver badges74 bronze badges · Accepted Answer · 2019-07-12 18:19:40Z

Another method is to add a new column, "Year Constructed," using a conditional expression like this:

Case 
 when "P1939" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1939
 when "P1940" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1940
 when "P1950" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1950
 when "P1960" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1960
 when "P1970" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1970
 when "P1980" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1980
 when "P1990" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 1990
 when "P2000" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 2000
 when "P2010" = max("P1939", "P1940", "P1950", "P1960", "P1970", "P1980", "P1990", "P2000", "P2010") then 2010
 else 0
end

Apply a categorized style using the new "Year Constructed" column.

Stack Exchange Network

Finding maximum distribution between multiple columns

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Finding maximum distribution between multiple columns

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions