Discrete event simulation for production defects

Question 1

I have one machine, which produces parts. In machine_failure_rate% it produces faulty parts which need to be produced again. Thus, we end up with a simple queuing problem. Can the following code be futher functionalized? I have the feeling, I can get rid of time_parts, but all I have in mind deteriorates the code as I need further lookups in the production_df data frame to look for "what was produced / what needs to be produced now?". The following script is running:

input_rate <- 1/60 # input rate [1/min, 1/input_rate corresponds to interarrival time in min]
n <- 1000 # number of parts
dt <- 1 # timestep = time to transfer faulty parts back to production. [min]
machine_production_rate <- 1/40 # production rate [1/min]
machine_failure_rate <- 0.2 # machine failure rate
# Sum all interarrival times
set.seed(123456)
t_event <- cumsum(rpois(n, 1/input_rate))
# Create initial list of tasks. Produces parts will be cut off.
time_parts <- data.frame(id = c(1:n),
 t = t_event,
 stringsAsFactors = FALSE)
# ========= Functions ==========================================================
create_machine <- function(failure_rate, production_rate) {
 machine <- list()
 machine$failure_rate <- failure_rate
 machine$production_rate <- production_rate
 machine$is_occupied <- FALSE
 return(machine);
}
update_machine <- function(ind_production_df, machine, production_df) {
 if (machine$is_occupied) {
 if (production_df$po_start[ind_production_df] + 1/machine$production_rate <= t) {
 machine$is_occupied <- FALSE
 }
 }
 return(machine)
}
production_summary <- function(production_df, machine, input_rate) {
 no_of_failures <- sum(production_df$no_failures)
 total_production_time <- max(production_df$po_start) + 1/machine$production_rate
 uptime <- (no_of_failures + n)/machine$production_rate
 print(paste0("Estimated machine$failure_rate ", 
 round(no_of_failures/(no_of_failures + n), 2),
 " [theory ", round(machine$failure_rate, 2), "]"))
 print(paste0("Up-time ", uptime, 
 ", of total time ", total_production_time, ". Auslastung ",
 round(uptime/total_production_time, 2),
 " [theory ", round(input_rate/machine$production_rate*1/(1 - machine$failure_rate), 2), "]"))
}
# ========= DE simulation ======================================================
machine <- create_machine(machine_failure_rate, machine_production_rate)
production_df <- data.frame(id = time_parts$id,
 time = time_parts$t,
 production_start = rep(0, nrow(time_parts)),
 no_failures = rep(0, nrow(time_parts)),
 stringsAsFactors = FALSE)
t <- 0
while (length(time_parts$t) > 0) {
 ind_production_df <- which(production_df$id == time_parts$id[1])
 machine <- update_machine(ind_production_df, machine, production_df)
 if (!machine$is_occupied & time_parts$t[1] <= t) {
 # A machine is available and a part needs to be produced
 machine$is_occupied <- TRUE
 production_df$po_start[ind_production_df] <- t
 if (runif(1) < machine$failure_rate) {
 # bad part
 time_parts$t[1] <- time_parts$t[1] + dt
 time_parts <- time_parts[sort(time_parts$t, index.return = TRUE)$ix, ]
 production_df$no_failures[ind_production_df] <- 
 production_df$no_failures[ind_production_df] + 1
 t <- t + min(time_parts$t[1], dt)
 } else {
 # good part
 if (production_df$po_start[ind_production_df] + 1/machine$production_rate >= t &&
 nrow(time_parts) >= 2) {
 time_parts <- time_parts[2:(nrow(time_parts)), ]
 } else {
 time_parts <- time_parts[FALSE, ]
 }
 t <- t + 1/machine$production_rate
 machine$is_occupied <- FALSE
 }
 } else {
 # machine is occupied or no part needs to be produced
 t <- t + min(time_parts$t[1], dt)
 }
}
# ========= Results ============================================================
production_summary(production_df, machine, input_rate)

Backround: I think about a generalisation (more machines, more input-sources, more complex rules how/when/... parts a produces). I fear that I will end up with tons of unreadable and unmaintainable code-lines if I proceed like this.

Edit: I think t <- t + min(time_parts$t[1], dt) is a bug and the correct version is t <- min(time_parts$t[1], t + dt). It only worked because the time difference dt was always the minimum. In the last case you could speed up using t <- max(time_parts$t[1], t + dt) as there is nothing to do in the time inbetween.

Question 2

Could you explain what "po_start" represents - is this the time at which production starts for a given part?

Question 3

Yes, it means, when the production really starts - due to a queue, this can be delayed in contrast to the original t_event. I also made a tiny edit to the post...

Question 4

I've been working on this, but can't post my code from work. I can restructure the code and get the same result but I was wondering whether there might be an error. Within your code t <- t + min(time_parts$t[1], dt). The contents of time_parts are the earliest-time a given part can be made and the current time must increase at each iteration, shouldn't you update current time to max(t + dt, time_parts$t[1])?

Question 5

@RussHyde No, I don't think so: if I increase t, I should take the earliest point in time where something can change. This is save. It might be, that sometimes a max would speed up... I am curious about your results :-)

Question 6

This was a pretty difficult challenge - principally because R doesn't have a built-in priority queue data-structure, but also because the priority-queue-like data-frame (time_parts) was wrapped around the results-storing data-frame (production_df) and the main while loop contains code at a few different levels of abstraction.

Idiomatic `R`

I did some simple stuff first: pulled all your functions to the start of the script, reformatted some code/comments.

There was a couple of things I changed for idiomatic reasons:

which(production_df$id == time_parts$id[1])
# -->
match(time_parts$id[1], production_df$id)
# time_parts[2:(nrow(time_parts)), ] # and
# time_parts[FALSE, ] # when time_parts has only one row
# can both be replaced with
time_parts[-1, ]
# (which is the idiomatic way to drop the first row) so this allowed us to remove an if-else clause
# You don't need to do rep(some_value, n) when you're adding a
# constant column to a data-frame at construction:
production_df <- data.frame(id = time_parts$id,
 time = time_parts$t,
 production_start = rep(0, nrow(time_parts)),
 no_failures = rep(0, nrow(time_parts)),
 stringsAsFactors = FALSE)
# -->
production_df <- data.frame(id = time_parts$id,
 time = time_parts$t,
 production_start = 0,
 no_failures = 0,
 stringsAsFactors = FALSE)
# `order(...)` does the same thing as `sort(..., index.return)$ix`
sort(time_parts$t, index.return = TRUE)$ix
# -->
order(time_parts$t)
# `nrow(x)` is more idiomatic than `length(x$some_column)`
while(length(time_parts$t) > 0){ 
# -->
while(nrow(time_parts) > 0) {
# but I subsequently replaced this newer line as well

Explicit data-classes

I converted your create_machine function so that it returns an object of class "Machine"; this wasn't really necessary.

create_machine <- function(failure_rate, production_rate) {
 structure(
 list(
 failure_rate = failure_rate,
 production_rate = production_rate,
 is_occupied = FALSE
 ),
 class = "Machine"
 )
}

I added a create_part function that similarly returns a Part object. There was a lot of repeats of 1 / machine$production_rate in your code; I replaced these with a call to part$production_duration. Also I thought your test to see whether a produced part was a failure should be associated with the produced part object (part$is_failure); with this, the while-loop logic becomes more explicit:

create_part <- function(machine) {
 structure(
 list(
 is_failure = runif(1) < machine$failure_rate,
 production_duration = 1 / machine$production_rate
 ),
 class = "Part"
 )
}
# then we can use this in the while-loop
part <- create_part(machine)
if (part$is_failure) {
 # bad part logic
 ...
} else {
 # good part logic
 ...
}

Restructuring the `while` loop

I wanted to push that while-loop into a function - the less work you do in the global environment, the better.

Since you want to extract data from production_df for your report, the function should return the production_df. During the while-loop, you access production_df, time_parts, t, dt (which I renamed dt_recovery based on your comments), n and machine. So we might want to pass all of those into that function. But we can compute some of those from the others:

n is the nrow of production_df,
t isn't needed outside of the while loop and
the data that initialises time_parts also initialises production_df.

The only thing we need to initialise both time_parts and production_df is the arrival-times or times at which the parts were ordered (which I renamed t_ordered).

So, we can put that while-loop into a function that takes arguments t_ordered, dt_recovery, machine.

run_event_simulation <- function(t_ordered, machine, dt_recovery) {
 n_parts <- length(t_ordered)
 # results data-frame
 production_df <- data.frame(
 id = seq(n_parts),
 t_ordered = t_ordered,
 t_started = 0,
 t_completed = 0,
 no_failures = 0,
 stringsAsFactors = FALSE
 )
 time_parts <- ... # define in terms of production_df
 # while-loop logic
 # return the updated production_df

I added the column t_completed into production_df so that you can more easily compute total_production_time from production_df in your report (this allows you to generalise the production rates)

# in `production_summary`
...
total_production_time <- max(production_df$t_completed)
...

A functional priority queue

The really big step:

R doesn't have a native priority-queue, and it would be pretty hard to encode using the S3 or S4 classes since you can't update by reference in those classes. There is a priority-queue defined in the package liqueueR, but I've no experience of that. So I just wrote a simpler version of the priority queue (as an S3 class): this allows you to

peek: extract the element in the queue with the lowest priority value (without mutating the queue)
delete_min: remove that element with the lowest priority value from the queue and return the resulting queue
add: add a new element to the queue according to it's priority, returning the resulting queue
and provides a couple of helper methods (is_empty, nrow)

However, this doesn't provide a pop_element(queue): typically, pop_element removes the leading element from the queue and returns that element. That is, it returns the leading element and updates the queue through a side-effect. This side-effect is problematic in R, so I didn't include a pop_element function. To achieve pop_element you have to peek and then delete_min.


# Priority Queue class
create_priority_queue <- function(x, priority_column) {
 structure(
 list(
 # note that we only `order` once - see `add` for how this is possible
 queue = x[order(x[[priority_column]]), ]
 ),
 class = "PriorityQueue",
 priority_column = priority_column
 )
}
# generic methods for Priority Queue
is_empty <- function(x, ...) UseMethod("is_empty")
peek <- function(x, ...) UseMethod("peek")
delete_min <- function(x, ...) UseMethod("delete_min")
add <- function(x, ...) UseMethod("add")
nrow <- function(x, ...) UseMethod("nrow")
nrow.default <- function(x, ...) {
 base::nrow(x)
}
# implemented methods for Priority Queue
nrow.PriorityQueue <- function(x, ...) {
 nrow(x$queue)
}
is_empty.PriorityQueue <- function(x, ...) {
 nrow(x) == 0
}
peek.PriorityQueue <- function(x, ...) {
 x$queue[1, ]
}
delete_min.PriorityQueue <- function(x, ...) {
 x$queue <- x$queue[-1, ]
 x
}
add.PriorityQueue <- function(x, new_element, ...) {
 priority_column <- attr(x, "priority_column")
 # split the existing values by comparison of their priorities to
 # those of the new-element
 lhs <- which(x$queue[[priority_column]] <= new_element[[priority_column]])
 rhs <- setdiff(seq(nrow(x)), lhs)
 x$queue <- rbind(x$queue[lhs, ], new_element, x$queue[rhs, ])
 x
}

Then I replaced your time_parts data-frame with a PriorityQueue:

# inside run_event_simulation
...
 # Create initial list of tasks. Once produced, a part will be removed from the
 # queue.
 product_queue <- create_priority_queue(
 data.frame(
 id = production_df$id,
 t = production_df$t_ordered
 ),
 "t"
 )
...

I added a few other helpers. The final code looks like this:

# ---- classes
# Priority Queue class
create_priority_queue <- function(x, priority_column) {
 structure(
 list(
 queue = x[order(x[[priority_column]]), ]
 ),
 class = "PriorityQueue",
 priority_column = priority_column
 )
}
# A machine for producing `Part`s
create_machine <- function(failure_rate, production_rate) {
 structure(
 list(
 failure_rate = failure_rate,
 production_rate = production_rate,
 is_occupied = FALSE
 ),
 class = "Machine"
 )
}
# A manufactured part
create_part <- function(machine) {
 structure(
 list(
 is_failure = runif(1) < machine$failure_rate,
 production_duration = 1 / machine$production_rate
 ),
 class = "Part"
 )
}
# methods for Priority Queue
is_empty <- function(x, ...) UseMethod("is_empty")
peek <- function(x, ...) UseMethod("peek")
delete_min <- function(x, ...) UseMethod("delete_min")
add <- function(x, ...) UseMethod("add")
nrow <- function(x, ...) UseMethod("nrow")
nrow.default <- function(x, ...) {
 base::nrow(x)
}
nrow.PriorityQueue <- function(x, ...) {
 nrow(x$queue)
}
is_empty.PriorityQueue <- function(x, ...) {
 nrow(x) == 0
}
peek.PriorityQueue <- function(x, ...) {
 x$queue[1, ]
}
delete_min.PriorityQueue <- function(x, ...) {
 x$queue <- x$queue[-1, ]
 x
}
add.PriorityQueue <- function(x, new_element, ...) {
 priority_column <- attr(x, "priority_column")
 lhs <- which(x$queue[[priority_column]] <= new_element[[priority_column]])
 rhs <- setdiff(seq(nrow(x)), lhs)
 x$queue <- rbind(x$queue[lhs, ], new_element, x$queue[rhs, ])
 x
}
# ---- functions
update_machine <- function(machine,
 ind_production_df,
 production_df,
 current_time) {
 if (machine$is_occupied) {
 if (
 production_df$t_started[ind_production_df]
 + 1 / machine$production_rate <= current_time
 ) {
 machine$is_occupied <- FALSE
 }
 }
 return(machine)
}
should_produce_part <- function(machine,
 earliest_production_time,
 current_time) {
 !machine$is_occupied &&
 earliest_production_time <= current_time
}
increment_failures <- function(df, i) {
 df[i, "no_failures"] <- 1 + df[i, "no_failures"]
 df
}
# ---- format results
production_summary <- function(production_df, machine, input_rate) {
 n_parts <- nrow(production_df)
 no_of_failures <- sum(production_df$no_failures)
 total_production_time <- max(production_df$t_completed)
 uptime <- (no_of_failures + n_parts) / machine$production_rate
 print(paste0(
 "Estimated machine$failure_rate ",
 round(no_of_failures / (no_of_failures + n_parts), 2),
 " [theory ", round(machine$failure_rate, 2), "]"
 ))
 print(paste0(
 "Up-time ", uptime,
 ", of total time ", total_production_time, ". Auslastung ",
 round(uptime / total_production_time, 2),
 " [theory ",
 round(
 input_rate / machine$production_rate * 1 / (1 - machine$failure_rate), 2
 ),
 "]"
 ))
}
# ---- discrete-event simulation
#
run_event_simulation <- function(t_ordered, machine, dt_recovery) {
 n_parts <- length(t_ordered)
 # results data-frame
 production_df <- data.frame(
 id = seq(n_parts),
 t_ordered = t_ordered,
 t_started = 0,
 t_completed = 0,
 no_failures = 0,
 stringsAsFactors = FALSE
 )
 # Create initial list of tasks. Once produced, a part will be removed from the
 # queue.
 product_queue <- create_priority_queue(
 data.frame(
 id = production_df$id,
 t = production_df$t_ordered
 ),
 "t"
 )
 t <- 0
 while (!is_empty(product_queue)) {
 queued_part <- peek(product_queue)
 ind_production_df <- match(
 queued_part$id, production_df$id
 )
 machine <- update_machine(machine, ind_production_df, production_df, t)
 if (
 should_produce_part(machine,
 earliest_production_time = queued_part$t,
 current_time = t)
 ) {
 # A machine is available and a part needs to be produced
 # - pop the scheduled part from the queue; add it back if it's production
 # fails
 product_queue <- delete_min(product_queue)
 machine$is_occupied <- TRUE
 production_df$t_started[ind_production_df] <- t
 part <- create_part(machine)
 if (part$is_failure) {
 # bad part - add it back to the schedule
 queued_part$t <- queued_part$t + dt_recovery
 product_queue <- add(product_queue, queued_part)
 production_df <- increment_failures(production_df, ind_production_df)
 t <- t + min(peek(product_queue)$t, dt_recovery)
 } else {
 # good part
 t <- t + part$production_duration
 production_df$t_completed[ind_production_df] <- t
 machine$is_occupied <- FALSE
 }
 } else {
 # machine is occupied or no part needs to be produced
 t <- t + min(peek(product_queue)$t, dt_recovery)
 }
 }
 production_df
}
# ---- script
set.seed(123456)
# Input rate [1/min, 1/input_rate corresponds to interarrival time in min]
input_rate <- 1 / 60
# Number of parts
n_parts <- 1000
# timestep = time to transfer faulty parts back to production. [min]
dt_recovery <- 1
# Production rate [1/min]
machine_production_rate <- 1 / 40
# Machine failure rate
machine_failure_rate <- 0.2
# Sum all interarrival times
t_ordered <- cumsum(rpois(n_parts, 1 / input_rate))
machine <- create_machine(machine_failure_rate, machine_production_rate)
# ---- results
production_df <- run_event_simulation(
 t_ordered, machine, dt_recovery
)
production_summary(production_df, machine, input_rate)

Why aren't S3 queues easy?

(This is actually quite hard to explain). Well, the pop method on a priority-queue returns an element from the queue and moves the queue on by one step. (In R) Updating the queue might look like new_queue <- old_queue[-1] and obtaining the returned element might look like returned_element <- old_queue[1]. So a pop function might look like

pop <- function(q) {
 # extract the head
 el <- q[1]
 # In a reference-based language you could update the queue
 # using a side-effect like `q.drop()`
 # But in R, this creates a new queue: and if it isn't returned 
 # explicitly, it is thrown away at the end of the `pop` function
 new_q <- q[-1]
 # return the element that's at the head of the original queue
 el
}
# calling_env
my_q <- create_queue(...)
my_head <- pop(my_q)

But the queue has not been altered by that pop. Now we could rewrite that function to do something dangerous like q <<- q[-1] and that would update the q in the calling environment. I consider this dangerous because q might not exist in the calling environment and that introduces side-effects, which are much harder to reason about.

Question 7

@Christoph there was a couple of things I didn't feel comfortable restructuring. I couldn't work out why update_machine works the way it does: it seems to look into the future before it decides what to do now. It makes more sense to me for is_occupied to be set to FALSE at the end of each while-loop iteration.

Question 8

@Christoph could you give some indication of whether you'd prefer to generalise the production-time (eg, replace the fixed production time with a Poisson-sampled time) or the number of machines

Question 9

Updated. But it's pretty difficult to explain. I'm not particularly interested in stochastic simulation at present. I did have a look at your simmer documentation. Can you confirm that when you mclapply() over different simulation chains, different seeds are used for each chain?

Question 10

Cool, thanks a lot. Unfortunately, simmer is not my package - so I can't answer your question. :-(

Question 11

To your function pop <- function(q): why don't you just return(list(el=el, new_q=new_q)? Then you could work within one line r <- pop(q); el <- r$el; q <- r$new_q; rm(r);? But I still understand your comment that calling by reference would be smart...

Russ Hyde Russ Hyde 3002 silver badges8 bronze badges · Accepted Answer · 2019-02-10 19:10:40Z

This was a pretty difficult challenge - principally because R doesn't have a built-in priority queue data-structure, but also because the priority-queue-like data-frame (time_parts) was wrapped around the results-storing data-frame (production_df) and the main while loop contains code at a few different levels of abstraction.

Idiomatic `R`

I did some simple stuff first: pulled all your functions to the start of the script, reformatted some code/comments.

There was a couple of things I changed for idiomatic reasons:

which(production_df$id == time_parts$id[1])
# -->
match(time_parts$id[1], production_df$id)
# time_parts[2:(nrow(time_parts)), ] # and
# time_parts[FALSE, ] # when time_parts has only one row
# can both be replaced with
time_parts[-1, ]
# (which is the idiomatic way to drop the first row) so this allowed us to remove an if-else clause
# You don't need to do rep(some_value, n) when you're adding a
# constant column to a data-frame at construction:
production_df <- data.frame(id = time_parts$id,
 time = time_parts$t,
 production_start = rep(0, nrow(time_parts)),
 no_failures = rep(0, nrow(time_parts)),
 stringsAsFactors = FALSE)
# -->
production_df <- data.frame(id = time_parts$id,
 time = time_parts$t,
 production_start = 0,
 no_failures = 0,
 stringsAsFactors = FALSE)
# `order(...)` does the same thing as `sort(..., index.return)$ix`
sort(time_parts$t, index.return = TRUE)$ix
# -->
order(time_parts$t)
# `nrow(x)` is more idiomatic than `length(x$some_column)`
while(length(time_parts$t) > 0){ 
# -->
while(nrow(time_parts) > 0) {
# but I subsequently replaced this newer line as well

Explicit data-classes

I converted your create_machine function so that it returns an object of class "Machine"; this wasn't really necessary.

create_machine <- function(failure_rate, production_rate) {
 structure(
 list(
 failure_rate = failure_rate,
 production_rate = production_rate,
 is_occupied = FALSE
 ),
 class = "Machine"
 )
}

I added a create_part function that similarly returns a Part object. There was a lot of repeats of 1 / machine$production_rate in your code; I replaced these with a call to part$production_duration. Also I thought your test to see whether a produced part was a failure should be associated with the produced part object (part$is_failure); with this, the while-loop logic becomes more explicit:

create_part <- function(machine) {
 structure(
 list(
 is_failure = runif(1) < machine$failure_rate,
 production_duration = 1 / machine$production_rate
 ),
 class = "Part"
 )
}
# then we can use this in the while-loop
part <- create_part(machine)
if (part$is_failure) {
 # bad part logic
 ...
} else {
 # good part logic
 ...
}

Restructuring the `while` loop

I wanted to push that while-loop into a function - the less work you do in the global environment, the better.

Since you want to extract data from production_df for your report, the function should return the production_df. During the while-loop, you access production_df, time_parts, t, dt (which I renamed dt_recovery based on your comments), n and machine. So we might want to pass all of those into that function. But we can compute some of those from the others:

n is the nrow of production_df,
t isn't needed outside of the while loop and
the data that initialises time_parts also initialises production_df.

The only thing we need to initialise both time_parts and production_df is the arrival-times or times at which the parts were ordered (which I renamed t_ordered).

So, we can put that while-loop into a function that takes arguments t_ordered, dt_recovery, machine.

run_event_simulation <- function(t_ordered, machine, dt_recovery) {
 n_parts <- length(t_ordered)
 # results data-frame
 production_df <- data.frame(
 id = seq(n_parts),
 t_ordered = t_ordered,
 t_started = 0,
 t_completed = 0,
 no_failures = 0,
 stringsAsFactors = FALSE
 )
 time_parts <- ... # define in terms of production_df
 # while-loop logic
 # return the updated production_df

I added the column t_completed into production_df so that you can more easily compute total_production_time from production_df in your report (this allows you to generalise the production rates)

# in `production_summary`
...
total_production_time <- max(production_df$t_completed)
...

A functional priority queue

The really big step:

R doesn't have a native priority-queue, and it would be pretty hard to encode using the S3 or S4 classes since you can't update by reference in those classes. There is a priority-queue defined in the package liqueueR, but I've no experience of that. So I just wrote a simpler version of the priority queue (as an S3 class): this allows you to

peek: extract the element in the queue with the lowest priority value (without mutating the queue)
delete_min: remove that element with the lowest priority value from the queue and return the resulting queue
add: add a new element to the queue according to it's priority, returning the resulting queue
and provides a couple of helper methods (is_empty, nrow)

However, this doesn't provide a pop_element(queue): typically, pop_element removes the leading element from the queue and returns that element. That is, it returns the leading element and updates the queue through a side-effect. This side-effect is problematic in R, so I didn't include a pop_element function. To achieve pop_element you have to peek and then delete_min.


# Priority Queue class
create_priority_queue <- function(x, priority_column) {
 structure(
 list(
 # note that we only `order` once - see `add` for how this is possible
 queue = x[order(x[[priority_column]]), ]
 ),
 class = "PriorityQueue",
 priority_column = priority_column
 )
}
# generic methods for Priority Queue
is_empty <- function(x, ...) UseMethod("is_empty")
peek <- function(x, ...) UseMethod("peek")
delete_min <- function(x, ...) UseMethod("delete_min")
add <- function(x, ...) UseMethod("add")
nrow <- function(x, ...) UseMethod("nrow")
nrow.default <- function(x, ...) {
 base::nrow(x)
}
# implemented methods for Priority Queue
nrow.PriorityQueue <- function(x, ...) {
 nrow(x$queue)
}
is_empty.PriorityQueue <- function(x, ...) {
 nrow(x) == 0
}
peek.PriorityQueue <- function(x, ...) {
 x$queue[1, ]
}
delete_min.PriorityQueue <- function(x, ...) {
 x$queue <- x$queue[-1, ]
 x
}
add.PriorityQueue <- function(x, new_element, ...) {
 priority_column <- attr(x, "priority_column")
 # split the existing values by comparison of their priorities to
 # those of the new-element
 lhs <- which(x$queue[[priority_column]] <= new_element[[priority_column]])
 rhs <- setdiff(seq(nrow(x)), lhs)
 x$queue <- rbind(x$queue[lhs, ], new_element, x$queue[rhs, ])
 x
}

Then I replaced your time_parts data-frame with a PriorityQueue:

# inside run_event_simulation
...
 # Create initial list of tasks. Once produced, a part will be removed from the
 # queue.
 product_queue <- create_priority_queue(
 data.frame(
 id = production_df$id,
 t = production_df$t_ordered
 ),
 "t"
 )
...

I added a few other helpers. The final code looks like this:

# ---- classes
# Priority Queue class
create_priority_queue <- function(x, priority_column) {
 structure(
 list(
 queue = x[order(x[[priority_column]]), ]
 ),
 class = "PriorityQueue",
 priority_column = priority_column
 )
}
# A machine for producing `Part`s
create_machine <- function(failure_rate, production_rate) {
 structure(
 list(
 failure_rate = failure_rate,
 production_rate = production_rate,
 is_occupied = FALSE
 ),
 class = "Machine"
 )
}
# A manufactured part
create_part <- function(machine) {
 structure(
 list(
 is_failure = runif(1) < machine$failure_rate,
 production_duration = 1 / machine$production_rate
 ),
 class = "Part"
 )
}
# methods for Priority Queue
is_empty <- function(x, ...) UseMethod("is_empty")
peek <- function(x, ...) UseMethod("peek")
delete_min <- function(x, ...) UseMethod("delete_min")
add <- function(x, ...) UseMethod("add")
nrow <- function(x, ...) UseMethod("nrow")
nrow.default <- function(x, ...) {
 base::nrow(x)
}
nrow.PriorityQueue <- function(x, ...) {
 nrow(x$queue)
}
is_empty.PriorityQueue <- function(x, ...) {
 nrow(x) == 0
}
peek.PriorityQueue <- function(x, ...) {
 x$queue[1, ]
}
delete_min.PriorityQueue <- function(x, ...) {
 x$queue <- x$queue[-1, ]
 x
}
add.PriorityQueue <- function(x, new_element, ...) {
 priority_column <- attr(x, "priority_column")
 lhs <- which(x$queue[[priority_column]] <= new_element[[priority_column]])
 rhs <- setdiff(seq(nrow(x)), lhs)
 x$queue <- rbind(x$queue[lhs, ], new_element, x$queue[rhs, ])
 x
}
# ---- functions
update_machine <- function(machine,
 ind_production_df,
 production_df,
 current_time) {
 if (machine$is_occupied) {
 if (
 production_df$t_started[ind_production_df]
 + 1 / machine$production_rate <= current_time
 ) {
 machine$is_occupied <- FALSE
 }
 }
 return(machine)
}
should_produce_part <- function(machine,
 earliest_production_time,
 current_time) {
 !machine$is_occupied &&
 earliest_production_time <= current_time
}
increment_failures <- function(df, i) {
 df[i, "no_failures"] <- 1 + df[i, "no_failures"]
 df
}
# ---- format results
production_summary <- function(production_df, machine, input_rate) {
 n_parts <- nrow(production_df)
 no_of_failures <- sum(production_df$no_failures)
 total_production_time <- max(production_df$t_completed)
 uptime <- (no_of_failures + n_parts) / machine$production_rate
 print(paste0(
 "Estimated machine$failure_rate ",
 round(no_of_failures / (no_of_failures + n_parts), 2),
 " [theory ", round(machine$failure_rate, 2), "]"
 ))
 print(paste0(
 "Up-time ", uptime,
 ", of total time ", total_production_time, ". Auslastung ",
 round(uptime / total_production_time, 2),
 " [theory ",
 round(
 input_rate / machine$production_rate * 1 / (1 - machine$failure_rate), 2
 ),
 "]"
 ))
}
# ---- discrete-event simulation
#
run_event_simulation <- function(t_ordered, machine, dt_recovery) {
 n_parts <- length(t_ordered)
 # results data-frame
 production_df <- data.frame(
 id = seq(n_parts),
 t_ordered = t_ordered,
 t_started = 0,
 t_completed = 0,
 no_failures = 0,
 stringsAsFactors = FALSE
 )
 # Create initial list of tasks. Once produced, a part will be removed from the
 # queue.
 product_queue <- create_priority_queue(
 data.frame(
 id = production_df$id,
 t = production_df$t_ordered
 ),
 "t"
 )
 t <- 0
 while (!is_empty(product_queue)) {
 queued_part <- peek(product_queue)
 ind_production_df <- match(
 queued_part$id, production_df$id
 )
 machine <- update_machine(machine, ind_production_df, production_df, t)
 if (
 should_produce_part(machine,
 earliest_production_time = queued_part$t,
 current_time = t)
 ) {
 # A machine is available and a part needs to be produced
 # - pop the scheduled part from the queue; add it back if it's production
 # fails
 product_queue <- delete_min(product_queue)
 machine$is_occupied <- TRUE
 production_df$t_started[ind_production_df] <- t
 part <- create_part(machine)
 if (part$is_failure) {
 # bad part - add it back to the schedule
 queued_part$t <- queued_part$t + dt_recovery
 product_queue <- add(product_queue, queued_part)
 production_df <- increment_failures(production_df, ind_production_df)
 t <- t + min(peek(product_queue)$t, dt_recovery)
 } else {
 # good part
 t <- t + part$production_duration
 production_df$t_completed[ind_production_df] <- t
 machine$is_occupied <- FALSE
 }
 } else {
 # machine is occupied or no part needs to be produced
 t <- t + min(peek(product_queue)$t, dt_recovery)
 }
 }
 production_df
}
# ---- script
set.seed(123456)
# Input rate [1/min, 1/input_rate corresponds to interarrival time in min]
input_rate <- 1 / 60
# Number of parts
n_parts <- 1000
# timestep = time to transfer faulty parts back to production. [min]
dt_recovery <- 1
# Production rate [1/min]
machine_production_rate <- 1 / 40
# Machine failure rate
machine_failure_rate <- 0.2
# Sum all interarrival times
t_ordered <- cumsum(rpois(n_parts, 1 / input_rate))
machine <- create_machine(machine_failure_rate, machine_production_rate)
# ---- results
production_df <- run_event_simulation(
 t_ordered, machine, dt_recovery
)
production_summary(production_df, machine, input_rate)

Why aren't S3 queues easy?

(This is actually quite hard to explain). Well, the pop method on a priority-queue returns an element from the queue and moves the queue on by one step. (In R) Updating the queue might look like new_queue <- old_queue[-1] and obtaining the returned element might look like returned_element <- old_queue[1]. So a pop function might look like

pop <- function(q) {
 # extract the head
 el <- q[1]
 # In a reference-based language you could update the queue
 # using a side-effect like `q.drop()`
 # But in R, this creates a new queue: and if it isn't returned 
 # explicitly, it is thrown away at the end of the `pop` function
 new_q <- q[-1]
 # return the element that's at the head of the original queue
 el
}
# calling_env
my_q <- create_queue(...)
my_head <- pop(my_q)

But the queue has not been altered by that pop. Now we could rewrite that function to do something dangerous like q <<- q[-1] and that would update the q in the calling environment. I consider this dangerous because q might not exist in the calling environment and that introduces side-effects, which are much harder to reason about.

@Christoph there was a couple of things I didn't feel comfortable restructuring. I couldn't work out why update_machine works the way it does: it seems to look into the future before it decides what to do now. It makes more sense to me for is_occupied to be set to FALSE at the end of each while-loop iteration.
@Christoph could you give some indication of whether you'd prefer to generalise the production-time (eg, replace the fixed production time with a Poisson-sampled time) or the number of machines
Updated. But it's pretty difficult to explain. I'm not particularly interested in stochastic simulation at present. I did have a look at your simmer documentation. Can you confirm that when you mclapply() over different simulation chains, different seeds are used for each chain?
Cool, thanks a lot. Unfortunately, simmer is not my package - so I can't answer your question. :-(
To your function pop <- function(q): why don't you just return(list(el=el, new_q=new_q)? Then you could work within one line r <- pop(q); el <- r$el; q <- r$new_q; rm(r);? But I still understand your comment that calling by reference would be smart...

Stack Exchange Network

Discrete event simulation for production defects

1 Answer 1

Idiomatic `R`

Explicit data-classes

Restructuring the `while` loop

A functional priority queue

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Discrete event simulation for production defects

1 Answer 1

Idiomatic R

Explicit data-classes

Restructuring the while loop

A functional priority queue

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Idiomatic `R`

Restructuring the `while` loop