1

I am trying to get a boxplot with 3 different tools in each dataset size like the one below:

enter image description here

ggplot(data1, aes(x = dataset, y = time, color = tool)) + geom_boxplot() + 
 labs(x = 'Datasets', y = 'Seconds', title = 'Time') + 
 scale_y_log10() + theme_bw()

But I need to transform x-axis to log scale. For that, I need to numericize each dataset to be able to transform them to log scale. Even without transforming them, they look like the one below:

enter image description here

ggplot(data2, aes(x = dataset, y = time, color = tool)) + geom_boxplot() + 
 labs(x = 'Datasets', y = 'Seconds', title = 'Time') + 
 scale_y_log10() + theme_bw()

I checked boxplot parameters and grouping parameters of aes, but could not resolve my problem. At first, I thought this problem is caused by scaling to log, but removing those elements did not resolve the problem.

What am I missing exactly? Thanks...

Files are in this link. "data2" is the numericized version of "data1".

TobiO
1,3811 gold badge9 silver badges25 bronze badges
asked Dec 11, 2019 at 12:17
6
  • try ggplot(data2, aes(x = factor(dataset), ... Commented Dec 11, 2019 at 12:20
  • @PoGibas the problem is, I want to represent my data with their numeric value on x axis. I can't make it log-scale if I factorize the axis. Commented Dec 11, 2019 at 12:51
  • First log2 and the factorize Commented Dec 11, 2019 at 12:52
  • @PoGibas If I plot with factorized x-axis, they will have equal space between each other. That is why I want to leave x-axis as numeric, and already log2+factorize did plot in a wrong way, just like I experienced before. Commented Dec 11, 2019 at 12:57
  • @Batu Have you tried setting position="dodge" in the geom_boxplot function ? This should group your outputs. Commented Dec 11, 2019 at 13:00

1 Answer 1

3

Your question was a tough cookie, but I learned something new from it!

Just using group = dataset is not sufficient because you also have the tool variable to look out for. After digging around a bit, I found this post which made use of the interaction() function.

This is the trick that was missing. You want to use group because you are not using a factor for the x values, but you need to include tool in the separation of your data (hence using interaction() which will compute the possible crosses between the 2 variables).

# This is for pretty-printing the axis labels
my_labs <- function(x){
 paste0(x/1000, "k")
}
levs <- unique(data2$dataset)
ggplot(data2, aes(x = dataset, y = time, color = tool, 
 group = interaction(dataset, tool))) + 
 geom_boxplot() + labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
 scale_x_log10(breaks = levs, labels = my_labs) + # define a log scale with your axis ticks
 scale_y_log10() + theme_bw()

This plots

enter image description here

answered Dec 13, 2019 at 9:50
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.