I have one machine, which produces parts. In machine_failure_rate
% it produces faulty parts which need to be produced again. Thus, we end up with a simple queuing problem. Can the following code be futher functionalized? I have the feeling, I can get rid of time_parts
, but all I have in mind deteriorates the code as I need further lookups in the production_df
data frame to look for "what was produced / what needs to be produced now?". The following script is running:
input_rate <- 1/60 # input rate [1/min, 1/input_rate corresponds to interarrival time in min]
n <- 1000 # number of parts
dt <- 1 # timestep = time to transfer faulty parts back to production. [min]
machine_production_rate <- 1/40 # production rate [1/min]
machine_failure_rate <- 0.2 # machine failure rate
# Sum all interarrival times
set.seed(123456)
t_event <- cumsum(rpois(n, 1/input_rate))
# Create initial list of tasks. Produces parts will be cut off.
time_parts <- data.frame(id = c(1:n),
t = t_event,
stringsAsFactors = FALSE)
# ========= Functions ==========================================================
create_machine <- function(failure_rate, production_rate) {
machine <- list()
machine$failure_rate <- failure_rate
machine$production_rate <- production_rate
machine$is_occupied <- FALSE
return(machine);
}
update_machine <- function(ind_production_df, machine, production_df) {
if (machine$is_occupied) {
if (production_df$po_start[ind_production_df] + 1/machine$production_rate <= t) {
machine$is_occupied <- FALSE
}
}
return(machine)
}
production_summary <- function(production_df, machine, input_rate) {
no_of_failures <- sum(production_df$no_failures)
total_production_time <- max(production_df$po_start) + 1/machine$production_rate
uptime <- (no_of_failures + n)/machine$production_rate
print(paste0("Estimated machine$failure_rate ",
round(no_of_failures/(no_of_failures + n), 2),
" [theory ", round(machine$failure_rate, 2), "]"))
print(paste0("Up-time ", uptime,
", of total time ", total_production_time, ". Auslastung ",
round(uptime/total_production_time, 2),
" [theory ", round(input_rate/machine$production_rate*1/(1 - machine$failure_rate), 2), "]"))
}
# ========= DE simulation ======================================================
machine <- create_machine(machine_failure_rate, machine_production_rate)
production_df <- data.frame(id = time_parts$id,
time = time_parts$t,
production_start = rep(0, nrow(time_parts)),
no_failures = rep(0, nrow(time_parts)),
stringsAsFactors = FALSE)
t <- 0
while (length(time_parts$t) > 0) {
ind_production_df <- which(production_df$id == time_parts$id[1])
machine <- update_machine(ind_production_df, machine, production_df)
if (!machine$is_occupied & time_parts$t[1] <= t) {
# A machine is available and a part needs to be produced
machine$is_occupied <- TRUE
production_df$po_start[ind_production_df] <- t
if (runif(1) < machine$failure_rate) {
# bad part
time_parts$t[1] <- time_parts$t[1] + dt
time_parts <- time_parts[sort(time_parts$t, index.return = TRUE)$ix, ]
production_df$no_failures[ind_production_df] <-
production_df$no_failures[ind_production_df] + 1
t <- t + min(time_parts$t[1], dt)
} else {
# good part
if (production_df$po_start[ind_production_df] + 1/machine$production_rate >= t &&
nrow(time_parts) >= 2) {
time_parts <- time_parts[2:(nrow(time_parts)), ]
} else {
time_parts <- time_parts[FALSE, ]
}
t <- t + 1/machine$production_rate
machine$is_occupied <- FALSE
}
} else {
# machine is occupied or no part needs to be produced
t <- t + min(time_parts$t[1], dt)
}
}
# ========= Results ============================================================
production_summary(production_df, machine, input_rate)
Backround: I think about a generalisation (more machines, more input-sources, more complex rules how/when/... parts a produces). I fear that I will end up with tons of unreadable and unmaintainable code-lines if I proceed like this.
Edit: I think t <- t + min(time_parts$t[1], dt)
is a bug and the correct version is t <- min(time_parts$t[1], t + dt)
. It only worked because the time difference dt
was always the minimum. In the last case you could speed up using t <- max(time_parts$t[1], t + dt)
as there is nothing to do in the time inbetween.
1 Answer 1
This was a pretty difficult challenge - principally because R doesn't have a built-in priority queue data-structure, but also because the priority-queue-like data-frame (time_parts
) was wrapped around the results-storing data-frame (production_df
) and the main while
loop contains code at a few different levels of abstraction.
Idiomatic R
I did some simple stuff first: pulled all your functions to the start of the script, reformatted some code/comments.
There was a couple of things I changed for idiomatic reasons:
which(production_df$id == time_parts$id[1])
# -->
match(time_parts$id[1], production_df$id)
# time_parts[2:(nrow(time_parts)), ] # and
# time_parts[FALSE, ] # when time_parts has only one row
# can both be replaced with
time_parts[-1, ]
# (which is the idiomatic way to drop the first row) so this allowed us to remove an if-else clause
# You don't need to do rep(some_value, n) when you're adding a
# constant column to a data-frame at construction:
production_df <- data.frame(id = time_parts$id,
time = time_parts$t,
production_start = rep(0, nrow(time_parts)),
no_failures = rep(0, nrow(time_parts)),
stringsAsFactors = FALSE)
# -->
production_df <- data.frame(id = time_parts$id,
time = time_parts$t,
production_start = 0,
no_failures = 0,
stringsAsFactors = FALSE)
# `order(...)` does the same thing as `sort(..., index.return)$ix`
sort(time_parts$t, index.return = TRUE)$ix
# -->
order(time_parts$t)
# `nrow(x)` is more idiomatic than `length(x$some_column)`
while(length(time_parts$t) > 0){
# -->
while(nrow(time_parts) > 0) {
# but I subsequently replaced this newer line as well
Explicit data-classes
I converted your create_machine
function so that it returns an object of class "Machine"; this wasn't really necessary.
create_machine <- function(failure_rate, production_rate) {
structure(
list(
failure_rate = failure_rate,
production_rate = production_rate,
is_occupied = FALSE
),
class = "Machine"
)
}
I added a create_part
function that similarly returns a Part
object. There was a lot of repeats of 1 / machine$production_rate
in your code; I replaced these with a call to part$production_duration. Also I thought your test to see whether a produced part was a failure should be associated with the produced part object (part$is_failure
); with this, the while-loop logic becomes more explicit:
create_part <- function(machine) {
structure(
list(
is_failure = runif(1) < machine$failure_rate,
production_duration = 1 / machine$production_rate
),
class = "Part"
)
}
# then we can use this in the while-loop
part <- create_part(machine)
if (part$is_failure) {
# bad part logic
...
} else {
# good part logic
...
}
Restructuring the while
loop
I wanted to push that while-loop into a function - the less work you do in the global environment, the better.
Since you want to extract data from production_df
for your report, the function should return the production_df
. During the while-loop, you access production_df
, time_parts
, t
, dt
(which I renamed dt_recovery
based on your comments), n
and machine
.
So we might want to pass all of those into that function. But we can compute some of those from the others:
n
is the nrow ofproduction_df
,t
isn't needed outside of the while loop andthe data that initialises
time_parts
also initialisesproduction_df
.
The only thing we need to initialise both time_parts
and production_df
is the arrival-times or times at which the parts were ordered (which I renamed t_ordered
).
So, we can put that while-loop into a function that takes arguments t_ordered
, dt_recovery
, machine
.
run_event_simulation <- function(t_ordered, machine, dt_recovery) {
n_parts <- length(t_ordered)
# results data-frame
production_df <- data.frame(
id = seq(n_parts),
t_ordered = t_ordered,
t_started = 0,
t_completed = 0,
no_failures = 0,
stringsAsFactors = FALSE
)
time_parts <- ... # define in terms of production_df
# while-loop logic
# return the updated production_df
I added the column t_completed
into production_df
so that you can more easily compute total_production_time
from production_df
in your report (this allows you to generalise the production rates)
# in `production_summary`
...
total_production_time <- max(production_df$t_completed)
...
A functional priority queue
The really big step:
R doesn't have a native priority-queue, and it would be pretty hard to encode using the S3 or S4 classes since you can't update by reference in those classes. There is a priority-queue defined in the package liqueueR
, but I've no experience of that. So I just wrote a simpler version of the priority queue (as an S3 class): this allows you to
peek
: extract the element in the queue with the lowest priority value (without mutating the queue)delete_min
: remove that element with the lowest priority value from the queue and return the resulting queueadd
: add a new element to the queue according to it's priority, returning the resulting queue- and provides a couple of helper methods (
is_empty
,nrow
)
However, this doesn't provide a pop_element(queue)
: typically, pop_element
removes the leading element from the queue and returns that element. That is, it returns the leading element and updates the queue through a side-effect. This side-effect is problematic in R, so I didn't include a pop_element
function. To achieve pop_element
you have to peek
and then delete_min
.
# Priority Queue class
create_priority_queue <- function(x, priority_column) {
structure(
list(
# note that we only `order` once - see `add` for how this is possible
queue = x[order(x[[priority_column]]), ]
),
class = "PriorityQueue",
priority_column = priority_column
)
}
# generic methods for Priority Queue
is_empty <- function(x, ...) UseMethod("is_empty")
peek <- function(x, ...) UseMethod("peek")
delete_min <- function(x, ...) UseMethod("delete_min")
add <- function(x, ...) UseMethod("add")
nrow <- function(x, ...) UseMethod("nrow")
nrow.default <- function(x, ...) {
base::nrow(x)
}
# implemented methods for Priority Queue
nrow.PriorityQueue <- function(x, ...) {
nrow(x$queue)
}
is_empty.PriorityQueue <- function(x, ...) {
nrow(x) == 0
}
peek.PriorityQueue <- function(x, ...) {
x$queue[1, ]
}
delete_min.PriorityQueue <- function(x, ...) {
x$queue <- x$queue[-1, ]
x
}
add.PriorityQueue <- function(x, new_element, ...) {
priority_column <- attr(x, "priority_column")
# split the existing values by comparison of their priorities to
# those of the new-element
lhs <- which(x$queue[[priority_column]] <= new_element[[priority_column]])
rhs <- setdiff(seq(nrow(x)), lhs)
x$queue <- rbind(x$queue[lhs, ], new_element, x$queue[rhs, ])
x
}
Then I replaced your time_parts
data-frame with a PriorityQueue:
# inside run_event_simulation
...
# Create initial list of tasks. Once produced, a part will be removed from the
# queue.
product_queue <- create_priority_queue(
data.frame(
id = production_df$id,
t = production_df$t_ordered
),
"t"
)
...
I added a few other helpers. The final code looks like this:
# ---- classes
# Priority Queue class
create_priority_queue <- function(x, priority_column) {
structure(
list(
queue = x[order(x[[priority_column]]), ]
),
class = "PriorityQueue",
priority_column = priority_column
)
}
# A machine for producing `Part`s
create_machine <- function(failure_rate, production_rate) {
structure(
list(
failure_rate = failure_rate,
production_rate = production_rate,
is_occupied = FALSE
),
class = "Machine"
)
}
# A manufactured part
create_part <- function(machine) {
structure(
list(
is_failure = runif(1) < machine$failure_rate,
production_duration = 1 / machine$production_rate
),
class = "Part"
)
}
# methods for Priority Queue
is_empty <- function(x, ...) UseMethod("is_empty")
peek <- function(x, ...) UseMethod("peek")
delete_min <- function(x, ...) UseMethod("delete_min")
add <- function(x, ...) UseMethod("add")
nrow <- function(x, ...) UseMethod("nrow")
nrow.default <- function(x, ...) {
base::nrow(x)
}
nrow.PriorityQueue <- function(x, ...) {
nrow(x$queue)
}
is_empty.PriorityQueue <- function(x, ...) {
nrow(x) == 0
}
peek.PriorityQueue <- function(x, ...) {
x$queue[1, ]
}
delete_min.PriorityQueue <- function(x, ...) {
x$queue <- x$queue[-1, ]
x
}
add.PriorityQueue <- function(x, new_element, ...) {
priority_column <- attr(x, "priority_column")
lhs <- which(x$queue[[priority_column]] <= new_element[[priority_column]])
rhs <- setdiff(seq(nrow(x)), lhs)
x$queue <- rbind(x$queue[lhs, ], new_element, x$queue[rhs, ])
x
}
# ---- functions
update_machine <- function(machine,
ind_production_df,
production_df,
current_time) {
if (machine$is_occupied) {
if (
production_df$t_started[ind_production_df]
+ 1 / machine$production_rate <= current_time
) {
machine$is_occupied <- FALSE
}
}
return(machine)
}
should_produce_part <- function(machine,
earliest_production_time,
current_time) {
!machine$is_occupied &&
earliest_production_time <= current_time
}
increment_failures <- function(df, i) {
df[i, "no_failures"] <- 1 + df[i, "no_failures"]
df
}
# ---- format results
production_summary <- function(production_df, machine, input_rate) {
n_parts <- nrow(production_df)
no_of_failures <- sum(production_df$no_failures)
total_production_time <- max(production_df$t_completed)
uptime <- (no_of_failures + n_parts) / machine$production_rate
print(paste0(
"Estimated machine$failure_rate ",
round(no_of_failures / (no_of_failures + n_parts), 2),
" [theory ", round(machine$failure_rate, 2), "]"
))
print(paste0(
"Up-time ", uptime,
", of total time ", total_production_time, ". Auslastung ",
round(uptime / total_production_time, 2),
" [theory ",
round(
input_rate / machine$production_rate * 1 / (1 - machine$failure_rate), 2
),
"]"
))
}
# ---- discrete-event simulation
#
run_event_simulation <- function(t_ordered, machine, dt_recovery) {
n_parts <- length(t_ordered)
# results data-frame
production_df <- data.frame(
id = seq(n_parts),
t_ordered = t_ordered,
t_started = 0,
t_completed = 0,
no_failures = 0,
stringsAsFactors = FALSE
)
# Create initial list of tasks. Once produced, a part will be removed from the
# queue.
product_queue <- create_priority_queue(
data.frame(
id = production_df$id,
t = production_df$t_ordered
),
"t"
)
t <- 0
while (!is_empty(product_queue)) {
queued_part <- peek(product_queue)
ind_production_df <- match(
queued_part$id, production_df$id
)
machine <- update_machine(machine, ind_production_df, production_df, t)
if (
should_produce_part(machine,
earliest_production_time = queued_part$t,
current_time = t)
) {
# A machine is available and a part needs to be produced
# - pop the scheduled part from the queue; add it back if it's production
# fails
product_queue <- delete_min(product_queue)
machine$is_occupied <- TRUE
production_df$t_started[ind_production_df] <- t
part <- create_part(machine)
if (part$is_failure) {
# bad part - add it back to the schedule
queued_part$t <- queued_part$t + dt_recovery
product_queue <- add(product_queue, queued_part)
production_df <- increment_failures(production_df, ind_production_df)
t <- t + min(peek(product_queue)$t, dt_recovery)
} else {
# good part
t <- t + part$production_duration
production_df$t_completed[ind_production_df] <- t
machine$is_occupied <- FALSE
}
} else {
# machine is occupied or no part needs to be produced
t <- t + min(peek(product_queue)$t, dt_recovery)
}
}
production_df
}
# ---- script
set.seed(123456)
# Input rate [1/min, 1/input_rate corresponds to interarrival time in min]
input_rate <- 1 / 60
# Number of parts
n_parts <- 1000
# timestep = time to transfer faulty parts back to production. [min]
dt_recovery <- 1
# Production rate [1/min]
machine_production_rate <- 1 / 40
# Machine failure rate
machine_failure_rate <- 0.2
# Sum all interarrival times
t_ordered <- cumsum(rpois(n_parts, 1 / input_rate))
machine <- create_machine(machine_failure_rate, machine_production_rate)
# ---- results
production_df <- run_event_simulation(
t_ordered, machine, dt_recovery
)
production_summary(production_df, machine, input_rate)
Why aren't S3 queues easy?
(This is actually quite hard to explain). Well, the pop
method on a priority-queue returns an element from the queue and moves the queue on by one step. (In R) Updating the queue might look like new_queue <- old_queue[-1]
and obtaining the returned element might look like returned_element <- old_queue[1]
. So a pop function might look like
pop <- function(q) {
# extract the head
el <- q[1]
# In a reference-based language you could update the queue
# using a side-effect like `q.drop()`
# But in R, this creates a new queue: and if it isn't returned
# explicitly, it is thrown away at the end of the `pop` function
new_q <- q[-1]
# return the element that's at the head of the original queue
el
}
# calling_env
my_q <- create_queue(...)
my_head <- pop(my_q)
But the queue has not been altered by that pop
. Now we could rewrite that function to do something dangerous like q <<- q[-1]
and that would update the q
in the calling environment. I consider this dangerous because q
might not exist in the calling environment and that introduces side-effects, which are much harder to reason about.
-
\$\begingroup\$ @Christoph there was a couple of things I didn't feel comfortable restructuring. I couldn't work out why
update_machine
works the way it does: it seems to look into the future before it decides what to do now. It makes more sense to me foris_occupied
to be set to FALSE at the end of each while-loop iteration. \$\endgroup\$Russ Hyde– Russ Hyde2019年02月12日 13:08:03 +00:00Commented Feb 12, 2019 at 13:08 -
\$\begingroup\$ @Christoph could you give some indication of whether you'd prefer to generalise the production-time (eg, replace the fixed production time with a Poisson-sampled time) or the number of machines \$\endgroup\$Russ Hyde– Russ Hyde2019年02月12日 13:10:17 +00:00Commented Feb 12, 2019 at 13:10
-
\$\begingroup\$ Updated. But it's pretty difficult to explain. I'm not particularly interested in stochastic simulation at present. I did have a look at your
simmer
documentation. Can you confirm that when you mclapply() over different simulation chains, different seeds are used for each chain? \$\endgroup\$Russ Hyde– Russ Hyde2019年02月18日 09:53:08 +00:00Commented Feb 18, 2019 at 9:53 -
\$\begingroup\$ Cool, thanks a lot. Unfortunately, simmer is not my package - so I can't answer your question. :-( \$\endgroup\$Christoph– Christoph2019年02月18日 10:26:21 +00:00Commented Feb 18, 2019 at 10:26
-
\$\begingroup\$ To your function
pop <- function(q)
: why don't you justreturn(list(el=el, new_q=new_q)
? Then you could work within one liner <- pop(q); el <- r$el; q <- r$new_q; rm(r);
? But I still understand your comment that calling by reference would be smart... \$\endgroup\$Christoph– Christoph2019年08月07日 14:35:44 +00:00Commented Aug 7, 2019 at 14:35
part
? \$\endgroup\$t_event
. I also made a tiny edit to the post... \$\endgroup\$t <- t + min(time_parts$t[1], dt)
. The contents oftime_parts
are the earliest-time a given part can be made and the current time must increase at each iteration, shouldn't you update current time tomax(t + dt, time_parts$t[1])
? \$\endgroup\$t
, I should take the earliest point in time where something can change. This is save. It might be, that sometimes amax
would speed up... I am curious about your results :-) \$\endgroup\$