I have such a dataframe:
library(tibble)
dat <- tibble::tribble(
~level1,~level2,~level3,~level4,
"Beverages","Water","","",
"Beverages","Coffee","","",
"Beverages","Tea","Black tea","",
"Beverages","Tea","White tea","",
"Beverages","Tea","Green tea","Sencha",
"Beverages","Tea","Green tea","Gyokuro",
"Beverages","Tea","Green tea","Matcha",
"Beverages","Tea","Green tea","Pi Lo Chun"
)
Then I want to create these folders/subfolders/files (this is a file when there's no child):
.
|
`- Beverages
|
+- Water
|
+- Tea
| |
| +- White tea
| |
| +- Green tea
| | |
| | +- Sencha
| | |
| | +- Pi Lo Chun
| | |
| | +- Matcha
| | |
| | `- Gyokuro
| |
| `- Black tea
|
`- Coffee
I do as follows:
paths <- apply(dat, 1L, paste0, collapse = "/")
paths <- gsub("/*$", "", paths)
paths <- stringi::stri_replace_last(paths, "//", fixed = "/")
paths <- strsplit(paths, "//")
folders <- lapply(paths, head, 1L)
files <- lapply(paths, function(path) do.call(file.path, as.list(path)))
lapply(folders, dir.create, recursive = TRUE, showWarnings = FALSE)
lapply(files, file.create, showWarnings = FALSE)
Do you have a better idea?
Maybe this view is better:
│
└───Beverages
│ Coffee
│ Water
│
└───Tea
│ Black tea
│ White tea
│
└───Green tea
Gyokuro
Matcha
Pi Lo Chun
Sencha
1 Answer 1
I would suggest:
paths <- sub("/+$", "", do.call(file.path, dat))
dirs <- sub("(.*)/.*", "\1円", paths)
for (dir in unique(dirs)) dir.create(dir, showWarnings = FALSE, recursive = TRUE)
for (file in paths) file.create(file)
It's a little simpler in that it only uses base
R functions (no dependency on the stringi
package), and it uses fewer operations and functions.
The only possible complexity is in the use of regular expressions, which you already seem familiar with.
"/+$"
matches one or more slashes at the end of a string; we replace these with""
"(.*)/.*"
matches the whole string but stores the directory path (everything up to the last slash) into\1円
.
The use of unique
can make the execution faster if you have a lot of files in the same directory.
Last, the use of a for
loop over lapply
is mostly a matter of preference.