Helper function to convert 'data.frame' to sf
Description
Helper function to convert 'data.frame' to sf
Usage
arrow_to_sf(tbl, metadata)
Arguments
tbl
data.frame from reading an Arrow dataset
metadata
list of validated geo metadata
Value
object of sf with CRS and geometry columns
Create standardised geo metadata for Parquet files
Description
Create standardised geo metadata for Parquet files
Usage
create_metadata(df)
Arguments
df
object of class sf
Details
Reference for metadata standard:
https://github.com/geopandas/geo-arrow-spec. This is compatible with
GeoPandas Parquet files.
Value
JSON formatted list with geo-metadata
Convert sfc geometry columns into a WKB binary format
Description
Convert sfc geometry columns into a WKB binary format
Usage
encode_wkb(df)
Arguments
df
sf object
Details
Allows for more than one geometry column in sfc format
Value
data.frame with binary geometry column(s)
Read an Arrow multi-file dataset and create sf object
Description
Read an Arrow multi-file dataset and create sf object
Usage
read_sf_dataset(dataset, find_geom = FALSE)
Arguments
dataset
a Dataset object created by arrow::open_dataset
or an arrow_dplyr_query
find_geom
logical. Only needed when returning a subset of columns.
Should all available geometry columns be selected and added to to the
dataset query without being named? Default is FALSE to require
geometry column(s) to be selected specifically.
Details
This function is primarily for use after opening a dataset with
arrow::open_dataset. Users can then query the arrow Dataset
using dplyr methods such as filter or
select . Passing the resulting query to this function
will parse the datasets and create an sf object. The function
expects consistent geographic metadata to be stored with the dataset in
order to create sf objects.
Value
object of class sf
See Also
open_dataset , st_read , st_read_parquet
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create random grouping
nc$group <- sample(1:3, nrow(nc), replace = TRUE)
# use dplyr to group the dataset. %>% also allowed
nc_g <- dplyr::group_by(nc, group)
# write out to parquet datasets
tf <- tempfile() # create temporary location
on.exit(unlink(tf))
# partitioning determined by dplyr 'group_vars'
write_sf_dataset(nc_g, path = tf)
list.files(tf, recursive = TRUE)
# open parquet files from dataset
ds <- arrow::open_dataset(tf)
# create a query. %>% also allowed
q <- dplyr::filter(ds, group == 1)
# read the dataset (piping syntax also works)
nc_d <- read_sf_dataset(dataset = q)
nc_d
plot(sf::st_geometry(nc_d))
sfarrow: An R package for reading/writing simple feature (sf)
objects from/to Arrow parquet/feather files with arrow
Description
Simple features are a popular, standardised way to create spatial vector data
with a list-type geometry column. Parquet files are standard column-oriented
files designed by Apache Arrow (https://parquet.apache.org/) for fast
read/writes. sfarrow is designed to support the reading and writing of
simple features in sf objects from/to Parquet files (.parquet) and
Feather files (.feather) within R. A key goal of sfarrow is to
support interoperability of spatial data in files between R and
Python through the use of standardised metadata.
Metadata
Coordinate reference and geometry field information for sf objects are
stored in standard metadata tables within the files. The metadata are based
on a standard representation (Version 0.1.0, reference:
https://github.com/geopandas/geo-arrow-spec). This is compatible with
the format used by the Python library GeoPandas for read/writing
Parquet/Feather files. Note to users: this metadata format is not yet stable
for production uses and may change in the future.
Credits
This work was undertaken by Chris Jochem, a member of the WorldPop Research Group at the University of Southampton(https://www.worldpop.org/).
Read a Feather file to sf object
Description
Read a Feather file. Uses standard metadata information to identify geometry columns and coordinate reference system information.
Usage
st_read_feather(dsn, col_select = NULL, ...)
Arguments
dsn
character file path to a data source
col_select
A character vector of column names to keep. Default is
NULL which returns all columns
...
additional parameters to pass to
FeatherReader
Details
Reference for the metadata used:
https://github.com/geopandas/geo-arrow-spec. These are standard with
the Python GeoPandas library.
Value
object of class sf
See Also
Examples
# load Natural Earth low-res dataset.
# Created in Python with GeoPandas.to_feather()
path <- system.file("extdata", package = "sfarrow")
world <- st_read_feather(file.path(path, "world.feather"))
world
plot(sf::st_geometry(world))
Read a Parquet file to sf object
Description
Read a Parquet file. Uses standard metadata information to identify geometry columns and coordinate reference system information.
Usage
st_read_parquet(dsn, col_select = NULL, props = NULL, ...)
Arguments
dsn
character file path to a data source
col_select
A character vector of column names to keep. Default is
NULL which returns all columns
props
Now deprecated in read_parquet .
...
additional parameters to pass to
ParquetFileReader
Details
Reference for the metadata used:
https://github.com/geopandas/geo-arrow-spec. These are standard with
the Python GeoPandas library.
Value
object of class sf
See Also
Examples
# load Natural Earth low-res dataset.
# Created in Python with GeoPandas.to_parquet()
path <- system.file("extdata", package = "sfarrow")
world <- st_read_parquet(file.path(path, "world.parquet"))
world
plot(sf::st_geometry(world))
Write sf object to Feather file
Description
Convert a simple features spatial object from sf and
write to a Feather file using write_feather . Geometry
columns (type sfc) are converted to well-known binary (WKB) format.
Usage
st_write_feather(obj, dsn, ...)
Arguments
obj
object of class sf
dsn
data source name. A path and file name with .parquet extension
...
additional options to pass to write_feather
Value
obj invisibly
See Also
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create temp file
tf <- tempfile(fileext = '.feather')
on.exit(unlink(tf))
# write out object
st_write_feather(obj = nc, dsn = tf)
# In Python, read the new file with geopandas.read_feather(...)
# read back into R
nc_f <- st_read_feather(tf)
Write sf object to Parquet file
Description
Convert a simple features spatial object from sf and
write to a Parquet file using write_parquet . Geometry
columns (type sfc) are converted to well-known binary (WKB) format.
Usage
st_write_parquet(obj, dsn, ...)
Arguments
obj
object of class sf
dsn
data source name. A path and file name with .parquet extension
...
additional options to pass to write_parquet
Value
obj invisibly
See Also
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create temp file
tf <- tempfile(fileext = '.parquet')
on.exit(unlink(tf))
# write out object
st_write_parquet(obj = nc, dsn = tf)
# In Python, read the new file with geopandas.read_parquet(...)
# read back into R
nc_p <- st_read_parquet(tf)
Basic checking of key geo metadata columns
Description
Basic checking of key geo metadata columns
Usage
validate_metadata(metadata)
Arguments
metadata
list for geo metadata
Value
None. Throws an error and stops execution
Write sf object to an Arrow multi-file dataset
Description
Write sf object to an Arrow multi-file dataset
Usage
write_sf_dataset(
obj,
path,
format = "parquet",
partitioning = dplyr::group_vars(obj),
...
)
Arguments
obj
object of class sf
path
string path referencing a directory for the output
format
output file format ("parquet" or "feather")
partitioning
character vector of columns in obj for grouping or
the dplyr::group_vars
...
additional arguments and options passed to
arrow::write_dataset
Details
Translate an sf spatial object to data.frame with WKB
geometry columns and then write to an arrow dataset with
partitioning. Allows for dplyr grouped datasets (using
group_by ) and uses those variables to define
partitions.
Value
obj invisibly
See Also
write_dataset , st_read_parquet
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create random grouping
nc$group <- sample(1:3, nrow(nc), replace = TRUE)
# use dplyr to group the dataset. %>% also allowed
nc_g <- dplyr::group_by(nc, group)
# write out to parquet datasets
tf <- tempfile() # create temporary location
on.exit(unlink(tf))
# partitioning determined by dplyr 'group_vars'
write_sf_dataset(nc_g, path = tf)
list.files(tf, recursive = TRUE)
# open parquet files from dataset
ds <- arrow::open_dataset(tf)
# create a query. %>% also allowed
q <- dplyr::filter(ds, group == 1)
# read the dataset (piping syntax also works)
nc_d <- read_sf_dataset(dataset = q)
nc_d
plot(sf::st_geometry(nc_d))