2
\$\begingroup\$

I tried writing a short function that expands brackets [] within a regular expression. Given a regular expression, the function will expand the brackets and return a vector of strings that explicitly spell out each match.

I attempted to account for two cases: 1) a regular expression with a single range (e.g. ^405[0-3L-O]$), 2) a regular expression with multiple ranges where each pattern with the ranges is separated by |. (e.g. ^W3812$|^405[0-3L-O]$|^N17[04][9FK]Z$). I also added additional feature where if show_expanded is set to TRUE, the resulting vector will be a named vector where each name would represent the values in an expanded range.

Below is the code.

#' Regular Expression Bracket Expander
#'
#' Given a regular expression with brackets, expands the expression with explicit matches.
#' Returns a vector of explicit matches.
#'
#' @param rex a regular expression
#' @param show_expanded if set to TRUE, the resulting vector will show each value in the expanded range as names.
#' @examples
#' r <- "^W3812$|^405[0-3L-O]$|^N17[04][9FK][0-3]Z$"
#' regex_expander(r)
regex_expander <- function(rex, show_expanded=TRUE){
 alpha_nums <- c(0:9, letters, LETTERS)
 rex_split <- strsplit(rex, split="\\|")[[1]]
 
 # extract range patterns
 range_pattern <- stringr::str_extract_all(rex_split, "\\[.*?\\]")
 
 # expand range
 expanded_patterns <- lapply(range_pattern, function(rng){
 if(length(rng) == 1){
 grep(rng, alpha_nums, value=TRUE)
 } else if(length(rng) > 1){
 # if more than 1 range, get every possible combination
 expanded <- lapply(rng, function(v) grep(v, alpha_nums, value=TRUE))
 apply(expand.grid(expanded), 1, function(x) paste0(x, collapse=""))
 } else{
 # no range in the pattern
 NULL
 }
 })
 
 # replace ranges with explicit nums/alphabets
 res <- mapply(function(rex, rng, expt){
 if(length(rng) == 1){
 sapply(expt, function(p) gsub(pattern=rng, replacement=p, x=rex, fixed=TRUE),
 USE.NAMES=show_expanded)
 } else if(length(rng) > 1){
 rng <- paste0(rng, collapse="")
 sapply(expt, function(p) gsub(pattern=rng, replacement=p, x=rex, fixed=TRUE),
 USE.NAMES=show_expanded)
 } else{
 warning("The expression ", rex, " does not contain any ranges.")
 rex
 }
 },
 rex_split, range_pattern, expanded_patterns,
 USE.NAMES=FALSE)
 
 # regex with no "|" separator
 if(is.matrix(res)) {
 rnms <- rownames(res)
 res <- as.vector(res)
 names(res) <- rnms
 # return
 res
 }
 
 # otherwise, return
 else unlist(res)
}

I tested with two examples.

  1. single range
r <- "^405[0-3L-O]$"
regex_expander(r, show_expanded=FALSE)
# output
# [1] "^4050$" "^4051$" "^4052$" "^4053$" "^405L$" "^405M$" "^405N$" "^405O$"
regex_expander(r, show_expanded=TRUE)
# output
# 0 1 2 3 L M N O 
# "^4050$" "^4051$" "^4052$" "^4053$" "^405L$" "^405M$" "^405N$" "^405O$" 
  1. multiple ranges separated by |
r <- "^W3812$|^405[0-3L-O]$|^N17[04][9FK][0-3]Z$"
regex_expander(r, show_expanded=FALSE)
# output
# [1] "^W3812$" "^4050$" "^4051$" "^4052$" "^4053$" "^405L$" 
# [7] "^405M$" "^405N$" "^405O$" "^N17090Z$" "^N17490Z$" "^N170F0Z$"
# [13] "^N174F0Z$" "^N170K0Z$" "^N174K0Z$" "^N17091Z$" "^N17491Z$" "^N170F1Z$"
# [19] "^N174F1Z$" "^N170K1Z$" "^N174K1Z$" "^N17092Z$" "^N17492Z$" "^N170F2Z$"
# [25] "^N174F2Z$" "^N170K2Z$" "^N174K2Z$" "^N17093Z$" "^N17493Z$" "^N170F3Z$"
# [31] "^N174F3Z$" "^N170K3Z$" "^N174K3Z$"
regex_expander(r, show_expanded=TRUE)
# output
# 0 1 2 3 L M 
# "^W3812$" "^4050$" "^4051$" "^4052$" "^4053$" "^405L$" "^405M$" 
# N O 090 490 0F0 4F0 0K0 
# "^405N$" "^405O$" "^N17090Z$" "^N17490Z$" "^N170F0Z$" "^N174F0Z$" "^N170K0Z$" 
# 4K0 091 491 0F1 4F1 0K1 4K1 
# "^N174K0Z$" "^N17091Z$" "^N17491Z$" "^N170F1Z$" "^N174F1Z$" "^N170K1Z$" "^N174K1Z$" 
# 092 492 0F2 4F2 0K2 4K2 093 
# "^N17092Z$" "^N17492Z$" "^N170F2Z$" "^N174F2Z$" "^N170K2Z$" "^N174K2Z$" "^N17093Z$" 
# 493 0F3 4F3 0K3 4K3 
# "^N17493Z$" "^N170F3Z$" "^N174F3Z$" "^N170K3Z$" "^N174K3Z$" 

I feel like there would be a better way to handle checking for one or multiple ranges. Any suggestions?

Thank you for reviewing and I would appreciate any feedback!

toolic
14.5k5 gold badges29 silver badges203 bronze badges
asked Sep 7, 2024 at 3:21
\$\endgroup\$

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.