This function maps values from a dataset to bit flags that can be encoded into a bitfield.

bf_map(protocol, data, registry, ..., name = NULL, pos = NULL, na.val = NULL)

Arguments

protocol

character(1)
the protocol based on which the flag should be determined, see Details.

data

the object to build bit flags for.

registry

registry(1)
an already defined bitfield registry.

...

the protocol-specific arguments for building a bit flag, see Details.

name

character(1)
optional flag-name.

pos

integerish(.)
optional position(s) in the bitfield that should be set.

na.val

value, of the same encoding type as the flag, that needs to be given, if the test for this flag results in NAs.

Value

an (updated) object of class 'registry' with the additional flag defined here.

Details

protocol can either be the name of an internal item (see bf_pcl), a newly built local protocol (bf_protocol) or one that has been imported from the bitfield community standards repo on github (bf_standards). Any protocol has specific arguments, typically at least the name of the column containing the values to test (x). To make this function as general as possible, all of these arguments are specified via the ... argument of bf_map. Internal protocols are:

  • na (x): test whether a variable contains NA-values (boolean).

  • nan (x): test whether a variable contains NaN-values (boolean).

  • inf (x): test whether a variable contains Inf-values (boolean).

  • identical (x, y): element-wise test whether values are identical across two variables (boolean).

  • range (x, min, max): test whether the values are within a given range (boolean).

  • matches (x, set): test whether the values match a given set (boolean).

  • grepl (x, pattern): test whether the values match a given pattern (boolean).

  • category (x): test whether the values are part of a set of given categories. (enumeration).

  • case (...): test whether values are part of given cases (enumeration).

  • nChar (x): count the number of characters of the values (unsigned integer).

  • nInt (x): count the number of integer digits of the values (unsigned integer).

  • nDec (x): count the decimal digits of the variable values (unsigned integer).

  • integer (x, ...): encode the integer values as bit-sequence (signed integer).

  • numeric (x, ...): encode the numeric value as floating-point bit-sequence (with an adapted precision) (floating-point).

Notes

The console output of various classes (such as tibble) shows decimals that are not present or rounds decimals that are present, even for ordinary numeric vectors. R stores numeric values internally as double-precision floating-point values (with 64 bits, where 52 bits encode the significand), which corresponds to a decimal precision of ~16 digits (log10(2^52)). Hence, if a bit flag doesn't seem to coincide with the values you see in the console, double check the values with sprintf("%16f", values). If you use a larger value than 16 for precision, you'll see more digits, but those are not meaningful, as they result merely from the binary-to-decimal conversion (check out .makeEncoding for an additional details.

Examples

# first, set up the registry
reg <- bf_registry(name = "testBF", description = "test bitfield")

# then, put the test for NA values together
reg <- bf_map(protocol = "na", data = bf_tbl, registry = reg,
              x = year)

# all the other protocols...
# boolean encoding
reg <- bf_map(protocol = "nan", data = bf_tbl, registry = reg,
              x = y)
reg <- bf_map(protocol = "inf", data = bf_tbl, registry = reg,
              x = y)
reg <- bf_map(protocol = "identical", data = bf_tbl, registry = reg,
              x = x, y = y, na.val = FALSE)
reg <- bf_map(protocol = "range", data = bf_tbl, registry = reg,
              x = yield, min = 10.4, max = 11)
reg <- bf_map(protocol = "matches", data = bf_tbl, registry = reg,
              x = commodity, set = c("soybean", "honey"))
reg <- bf_map(protocol = "grepl", data = bf_tbl, registry = reg,
              x = year, pattern = "*r")

# enumeration encoding
reg <- bf_map(protocol = "category", data = bf_tbl, registry = reg,
              x = commodity, na.val = 0)
reg <- bf_map(protocol = "case", data = bf_tbl, registry = reg, na.val = 0,
              yield >= 11, yield < 11 & yield > 9, yield < 9 & commodity == "maize")
#> Loading required package: dplyr
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
#> Loading required package: purrr

# integer encoding
reg <- bf_map(protocol = "nChar", data = bf_tbl, registry = reg,
              x = commodity, na.val = 0)
reg <- bf_map(protocol = "nInt", data = bf_tbl, registry = reg,
              x = yield)
reg <- bf_map(protocol = "nDec", data = bf_tbl, registry = reg,
              x = yield)
reg <- bf_map(protocol = "integer", data = bf_tbl, registry = reg,
              x = as.integer(year), na.val = 0L)
#> Warning: NAs introduced by coercion

# floating-point encoding
reg <- bf_map(protocol = "numeric", data = bf_tbl, registry = reg,
              x = yield, decimals = 2)

# finally, take a look at the registry
reg
#>   width 44
#>   flags 14  -|-|-|-|-|-|-|--|--|----|---|---|-----------|------------
#> 
#>   pos encoding type      col
#>   1   0.0.1/0  na        year
#>   2   0.0.1/0  nan       y
#>   3   0.0.1/0  inf       y
#>   4   0.0.1/0  identical x-y
#>   5   0.0.1/0  range     yield
#>   6   0.0.1/0  matches   commodity
#>   7   0.0.1/0  grepl     year
#>   8   0.0.2/0  category  commodity
#>   10  0.0.2/0  case      yield-commodity
#>   12  0.0.4/0  nChar     commodity
#>   16  0.0.3/0  nInt      yield
#>   19  0.0.3/0  nDec      yield
#>   22  0.0.11/0 integer   year
#>   33  0.1.11/0 numeric   yield

# alternatively, a raster
library(terra)
#> terra 1.8.60
#> 
#> Attaching package: ‘terra’
#> The following object is masked from ‘package:bitfield’:
#> 
#>     project
bf_rst <- rast(nrows = 3, ncols = 3, vals = bf_tbl$commodity, names = "commodity")
bf_rst$yield <- rast(nrows = 3, ncols = 3, vals = bf_tbl$yield)

reg <- bf_registry(name = "testBF", description = "raster bitfield")

reg <- bf_map(protocol = "na", data = bf_rst, registry = reg,
              x = commodity)

reg <- bf_map(protocol = "range", data = bf_rst, registry = reg,
              x = yield, min = 5, max = 11)

reg <- bf_map(protocol = "category", data = bf_rst, registry = reg,
              x = commodity, na.val = 0)
reg
#>   width 4
#>   flags 3  -|-|--
#> 
#>   pos encoding type     col
#>   1   0.0.1/0  na       commodity
#>   2   0.0.1/0  range    yield
#>   3   0.0.2/0  category commodity