Adds support for ALTREP contiguous views, i.e. vec_view(x, start, size), for the major R types
/*
Structure of a `view`:
- `data1` is:
- the original vector, before materialization
- the materialized vector, after materialization
- `data2` is a RAWSXP holding a `r_view_metadata`
*/
struct r_view_metadata {
// Whether or not the ALTREP view has been materialized.
bool materialized;
// The offset into the original data to start at.
r_ssize start;
// The size of the view.
r_ssize size;
// A read only pointer into the data, to save indirection costs. We typically
// set this upon view creation, unless `x` is ALTREP, in which case we delay
// setting it until the first `DATAPTR_RO()` request, to be friendly to
// ALTREP types that we wrap. After materialization, it is always set.
const void* v_data_read;
// A write only pointer into the data, to save indirection costs.
// Always `NULL` before materialization, and it set at materialization time.
// Never set for lists or character vectors, as write pointers are unsafe.
void* v_data_write;
};
Benchmarking - The most notable thing in the benchmarks is that slicing with a large slice can be up to 2x slower than native code due to ExtractSubset() in base R using the *_Elt() ALTREP method to extract out each element. I think it could be much more efficient by using Dataptr_or_null() to first see if it could get a cheap readonly dataptr to the altrep data (it can here, it's a view) and then using that directly. I plan to submit that patch or talk to Luke about it, then this difference would poof go away entirely.
Also note that the difference is mostly apparent only when a contiguous slice is made with ExtractSubset(), where native objects probably have 0 cache misses, but we have to go through an ALTREP method each time. When you extract a random slice of the same size, the timings actually tend to converge and cache misses become more important.
×
ばつ 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 x[[25L]] 0 41ns 23280892. 0B 69.8 #> 2 y[[25L]] 0 41ns 18932329. 0B 56.8 # A slice bench::mark( x[10:60], y[10:60], iterations = 1000000 ) #> # A tibble: 2
×ばつ 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 x[10:60] 123ns 246ns 3228295. 512B 25.8 #> 2 y[10:60] 287ns 369ns 2216498. 512B 15.5 # Still not materialized! view_is_materialized(y) #> [1] FALSE # Duplication currently forces materialization, I think this is something # we can fix in base R. They use `INTEGER()` where I think they should use # `INTEGER_RO()`. Returned object is not ALTREP. z <- duplicate(y) view_is_materialized(y) #> [1] TRUE is_altrep(z) #> [1] FALSE # A slice after materialization - still takes the same amount of time as before materialization # (in both cases we use a cached readonly pointer to the data) bench::mark( x[10:60], y[10:60], iterations = 1000000 ) #> # A tibble: 2
×ばつ 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 x[10:60] 123ns 246ns 3180695. 512B 22.3 #> 2 y[10:60] 287ns 369ns 2222617. 512B 13.3 # Large view, 1,000,000 elements base <- sample(1e7) x <- base[seq(from = 1000, length.out = 1000000)] y <- vec_view(base, start = 1000, size = 1000000) # Large slice (still roughly 2x slower, same as with small slice) bench::mark( x[1:500000], y[1:500000], iterations = 1000 ) #> # A tibble: 2
×ばつ 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 x[1:5e+05] 909.54μs 1.17ms 822. 3.81MB 24.5 #> 2 y[1:5e+05] 2.19ms 2.42ms 413. 3.81MB 10.2 # Large slice, random index # The difference goes away as cache misses tend to dominate instead idx <- sample(1e7, 1e6) bench::mark( x[idx], y[idx], iterations = 1000 ) #> # A tibble: 2
×ばつ 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 x[idx] 2.19ms 2.4ms 408. 3.81MB 10.5 #> 2 y[idx] 2.47ms 2.69ms 357. 3.81MB 8.78 # Still not materialized! view_is_materialized(y) #> [1] FALSE">
# R-devel at the time
getRversion()
#> [1] '4.5.0'
# load rlang
devtools::load_all(path = "~/files/r/packages/rlang/")
# Identical test
# Currently `identical()` forces materialization due to usage of `INTEGER()`
# rather than `INTEGER_RO()`, booooo.
x <- c(1, 2, 3, 4)
y <- vec_view(x, start = 1, size = 4)
identical(x, y)
#> [1] TRUE
view_is_materialized(y)
#> [1] TRUE
# Object size test
# I think lobstr is more accurate here, there is a small fixed amount of overhead
# from the metadata. Nicely neither materialize the object.
x <- c(1, 2, 3, 4)
y <- vec_view(x, start = 1, size = 4)
object.size(x)
#> 80 bytes
object.size(y)
#> 80 bytes
lobstr::obj_size(x)
#> 80 B
lobstr::obj_size(y)
#> 776 B
view_is_materialized(y)
#> [1] FALSE
# Small view, 100 elements
base <- sample(1e6)
x <- base[1:100]
y <- vec_view(base, start = 1L, size = 100L)
is_altrep(y)
#> [1] TRUE
view_is_materialized(y)
#> [1] FALSE
# One element
bench::mark(
x[[25L]],
y[[25L]],
iterations = 1000000
)
#> # A tibble: 2 ×ばつ 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x[[25L]] 0 41ns 23280892. 0B 69.8
#> 2 y[[25L]] 0 41ns 18932329. 0B 56.8
# A slice
bench::mark(
x[10:60],
y[10:60],
iterations = 1000000
)
#> # A tibble: 2 ×ばつ 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x[10:60] 123ns 246ns 3228295. 512B 25.8
#> 2 y[10:60] 287ns 369ns 2216498. 512B 15.5
# Still not materialized!
view_is_materialized(y)
#> [1] FALSE
# Duplication currently forces materialization, I think this is something
# we can fix in base R. They use `INTEGER()` where I think they should use
# `INTEGER_RO()`. Returned object is not ALTREP.
z <- duplicate(y)
view_is_materialized(y)
#> [1] TRUE
is_altrep(z)
#> [1] FALSE
# A slice after materialization - still takes the same amount of time as before materialization
# (in both cases we use a cached readonly pointer to the data)
bench::mark(
x[10:60],
y[10:60],
iterations = 1000000
)
#> # A tibble: 2 ×ばつ 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x[10:60] 123ns 246ns 3180695. 512B 22.3
#> 2 y[10:60] 287ns 369ns 2222617. 512B 13.3
# Large view, 1,000,000 elements
base <- sample(1e7)
x <- base[seq(from = 1000, length.out = 1000000)]
y <- vec_view(base, start = 1000, size = 1000000)
# Large slice (still roughly 2x slower, same as with small slice)
bench::mark(
x[1:500000],
y[1:500000],
iterations = 1000
)
#> # A tibble: 2 ×ばつ 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x[1:5e+05] 909.54μs 1.17ms 822. 3.81MB 24.5
#> 2 y[1:5e+05] 2.19ms 2.42ms 413. 3.81MB 10.2
# Large slice, random index
# The difference goes away as cache misses tend to dominate instead
idx <- sample(1e7, 1e6)
bench::mark(
x[idx],
y[idx],
iterations = 1000
)
#> # A tibble: 2 ×ばつ 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 x[idx] 2.19ms 2.4ms 408. 3.81MB 10.5
#> 2 y[idx] 2.47ms 2.69ms 357. 3.81MB 8.78
# Still not materialized!
view_is_materialized(y)
#> [1] FALSE
Uh oh!
There was an error while loading. Please reload this page.
DATAPTR()usage in character implementationExtract_subsetimplementation? Or is the fallback good enough? (New approach is much faster even withoutExtract_subsetdue to fasterEltaccess with cached pointers, we really want base R to handle all the intricacies of this.)#defines to section off code that doesn't work in older R versions, possibly bump minimal rlang R versionr_init_library_with_dll()helper with LionelThen
r_vec_resize()tor_vec_grow()and remove support for shrinkingr_dyn_unwrap()in this PRAdds support for ALTREP contiguous views, i.e.
vec_view(x, start, size), for the major R typesA few notes:
xgets marked as not mutable on the way inxright nowr_init_library_with_dll()helper that goes in yourR_init_<pkg>()function (good to know if you are utilize the rlang C API). The ALTREP classes are registered with thepackagename, so if vctrs uses the rlang C API then it will register its own ALTREP view classes under the"vctrs"package name. I think this is a good thing.Here are important notes on the internal structure:
Benchmarking - The most notable thing in the benchmarks is that slicing with a large slice can be up to 2x slower than native code due to
ExtractSubset()in base R using the*_Elt()ALTREP method to extract out each element. I think it could be much more efficient by usingDataptr_or_null()to first see if it could get a cheap readonly dataptr to the altrep data (it can here, it's a view) and then using that directly. I plan to submit that patch or talk to Luke about it, then this difference would poof go away entirely.Also note that the difference is mostly apparent only when a contiguous slice is made with
ExtractSubset(), where native objects probably have 0 cache misses, but we have to go through an ALTREP method each time. When you extract a random slice of the same size, the timings actually tend to converge and cache misses become more important.Otherwise I think it's pretty darn good!