I have two variables (V1, V2) which I need to plot against each other in a simple scatter plot. Some rows are missing either V1 or V2 so will not be included on a plot, but the remaining information in these rows is still of interest.
So I tried substituting the NAs with a value outside of the data range and adding an 'NA' label on the axes but the requirement of 'breaks' and 'labels' to be the same length causes additional grid lines.
Is it possible to have an axis label without a break? Any advice gratefully received!
Apologies that I can't post an image to illustrate my issue as I'm new to stackoverflow. Hopefully the code and link below will be enough.
# Simulated example data
library(ggplot2)
set.seed(112)
DF<-data.frame(V1=rnorm(20,10,4))
DF$V2<-DF$V1+rnorm(20,0,1)
DF[sample(1:dim(DF)[1],2),]$V1<-NA
DF[sample(1:dim(DF)[1],2),]$V2<-NA
# plot with NA rows removed
ggplot(DF,aes(x=V1,y=V2))+geom_point()+theme_bw()
# substitute NAs with value outside data range
DF$WasNA<-apply(DF,1,function(x)any(is.na(x)))
DF[is.na(DF$V1),]$V1<- -1
DF[is.na(DF$V2),]$V2<- -1
(p<-ggplot(DF,aes(x=V1,y=V2,colour=WasNA))+
geom_point()+
scale_colour_manual(values=c("black","grey70"))+
theme_bw())
p+
scale_x_continuous(breaks=c(-1,ggplot_build(p)$layout$panel_params[[1]]$x.major_source),labels=c("NA",ggplot_build(p)$layout$panel_params[[1]]$x.labels))+
scale_y_continuous(breaks=c(-1,ggplot_build(p)$layout$panel_params[[1]]$y.major_source),labels=c("NA",ggplot_build(p)$layout$panel_params[[1]]$y.labels))
ExamplePlot
(As an additional point of interest, I'm not certain why the extra break I add in is mirrored at the upper end of the scales too?)
2 Answers 2
It looks like you want a plotting method to help display missing values in ggplot? There's a geom in the naniar package that does this - geom_miss_point()
# Simulated example data
library(ggplot2)
set.seed(112)
DF<-data.frame(V1=rnorm(20,10,4))
DF$V2<-DF$V1+rnorm(20,0,1)
DF[sample(1:dim(DF)[1],2),]$V1<-NA
DF[sample(1:dim(DF)[1],2),]$V2<-NA
# plot with NA rows removed
ggplot(DF,aes(x=V1,y=V2))+geom_point()+theme_bw()
#> Warning: Removed 4 rows containing missing values (geom_point).
# plot with naniar - using shadow_shift
library(naniar)
ggplot(DF,
aes(x = V1,
y = V2)) +
geom_miss_point() +
theme_bw()
naniar does this by transforming the values below the range of the data - much like you have, and then plotting this. It also has other helpers for looking at missing data.
Let me know if you have any questions!
Comments
If you're using a plot design with a background grid, then I think there needs to be a grid line at the NA position. Otherwise the plot would look weird.
So my recommendation would be to get rid of the minor grid lines. That removes the problem of the weird additional lines that shouldn't be there.
p + scale_x_continuous(breaks=c(-1, ggplot_build(p)$layout$panel_params[[1]]$x.major_source),
labels=c("NA", ggplot_build(p)$layout$panel_params[[1]]$x.labels)) +
scale_y_continuous(breaks=c(-1, ggplot_build(p)$layout$panel_params[[1]]$y.major_source),
labels=c("NA", ggplot_build(p)$layout$panel_params[[1]]$y.labels)) +
theme(panel.grid.minor = element_blank())
If you want more grid lines, you could always define additional breaks (say at positions 2.5, 7.5, 12.5) and give them an empty label. This will simulate minor grid lines but at exactly the locations you want.
geom_vlineandgeom_hlineto create grid lines only where you want them. Another option would be to usegeom_miss_pointfrom thenaniarpackage to include missing data in the plot in a more automated way. Also see thenaniarvignette.