This function takes a numeric vector and returns "con" if it's continous and "bin" if it's binary.
It does not take the multinomial case into account, i.e. if a variable y has three possible values 0, 1, 2, it's treated like a continuous variable.
Code:
checkBinaryTrait = function(v, naVal="NA") {
if(!is.numeric(v)) stop("Only numeric vectors are accepted.")
vSet = unique(v)
if(!missing(naVal)) vSet[which(vSet == naVal)] = NA
vSet = vSet[which(!is.na(vSet))]
if(any(as.integer(vSet) != vSet)) return("con")
if(length(vSet) > 2) return("con")
"bin"
}
Tests:
v = c(1, 1.1, 1, 1.1, NA)
checkBinaryTrait(v)
v = c(1, 2, 1, 2, NA)
checkBinaryTrait(v)
v = c(-9, 2.3, 4.1, -9, -9)
checkBinaryTrait(v, -9)
v = c(-9, 2, 4, -9, -9)
checkBinaryTrait(v, -9)
3 Answers 3
Using the return
statement is not recommended.
You can get the same effect by rewriting with else if
and else
, like this:
checkBinaryTrait = function(v, naVal="NA") {
if (!is.numeric(v)) stop("Only numeric vectors are accepted.")
vSet = unique(v)
if (!missing(naVal)) vSet[vSet == naVal] = NA
vSet = vSet[!is.na(vSet)]
if (any(as.integer(vSet) != vSet)) "con"
else if (length(vSet) > 2) "con"
else "bin"
}
I also removed the unnecessary which
calls.
This code still passes all your tests.
Actually the last statement can be further simplified to:
if (any(as.integer(vSet) != vSet) || length(vSet) > 2) "con"
else "bin"
You might also want to change the return type to TRUE
or FALSE
,
in which case the last statement would become simply:
!(any(as.integer(vSet) != vSet) || length(vSet) > 2)
And then, how about renaming checkBinaryTrait
to is.binary
?
Finally, the <-
operator is more common than =
.
For example Google's style guide explicitly forbids using =
.
Let me address some portions of your code before providing an alternative implementation.
missing(naVal)
I would prefer not using this approach but an appropriate neutral default value for
naVal
. We can useNULL
for this purpose.vSet[which(vSet == naVal)] = NA
Replacing calues with
NA
before removing them is an unnecessary step. Furthermore, replacing values withNA
is easier with theis.na<-
function, for example,is.na(vSet) <- vSet == naVal
.vSet[which(!is.na(vSet))]
You can omit
NA
values with thena.omit
function.
Here's an alternative implementation. For details, have a look at the comments.
checkBinaryTrait <- function(v, naVal = NULL) {
if( !is.numeric(v) ) stop("Only numeric vectors are accepted.")
# remove NA's
v2 <- na.omit(v)
# get unique values
v_unique <- unique(v2)
# remove 'naVal's
v_unique2 <- v_unique[! v_unique %in% naVal]
# count number of unique values and check whether all values are integers
if ( length(unique(v_unique2)) > 2L ||
any(as.integer(v_unique2) != v_unique2) ) "con" else "bin"
}
Some tests:
> checkBinaryTrait(v, -9)
[1] "bin"
> checkBinaryTrait(c(1, 1.1, 1, 1.1, NA))
[1] "con"
> checkBinaryTrait(c(1, 2, 1, 2, NA))
[1] "bin"
> checkBinaryTrait(c(-9, 2.3, 4.1, -9, -9), -9)
[1] "con"
> checkBinaryTrait(c(-9, 2, 4, -9, -9), -9)
[1] "bin"
This implementations also allows multiple naVal
values:
> checkBinaryTrait(c(1, 2, 2, 1, -9, -9.9), c(-9, -9.9))
[1] "bin"
I felt factors()
was appropriate here. Assuming decimal values are also considered (nothing in the question related to this)
checkBinaryTrait = function(v,naVal = "NA"){
if (!is.numeric(v)) stop("Only numeric vectors are accepted.")
if(length(levels(factor(v[-which(v == naVal)]))) < 3) "bin" else "con"
}
If only integers are to be considered, coercing into integers:
checkBinaryTrait = function(v,naVal = "NA"){
if (!is.numeric(v)) stop("Only numeric vectors are accepted.")
if(length(levels(factor(as.integer(v[-which(v == naVal)])))) < 3) "bin" else "con"
}
NA
as a value too? \$\endgroup\$