This function maps values from a dataset into bit flags that can be encoded into a bitfield.

bf_map(
  protocol,
  data,
  ...,
  name = NULL,
  pos = NULL,
  na.val = NULL,
  description = NULL,
  registry = NULL
)

Arguments

protocol

character(1)
the protocol based on which the flag should be determined, see Details.

data

the object to build bit flags for.

...

the protocol-specific arguments for building a bit flag, see Details.

name

character(1)
optional flag-name.

pos

integerish(.)
the position(s) in the bitfield that should be set.

na.val

value, of the same encoding type as the flag, that needs to be given, if the test for this flag results in NAs.

description

character(.)
optional description that should be used instead of the default protocol-specific description. This description is used in the registry legend, so it should have as many entries as there will be flags (two for a binary flag, as many as there are cases for a enumeration flag and one for integer or numeric flags).

registry

registry(1)
a bitfield registry that has been defined with bf_registry; if it's undefined, an empty registry will be defined on-the-fly.

Value

an (updated) object of class 'registry' with the additional flag defined here.

Details

protocol can either be the name of an internal item bf_pcl, a newly built local protocol or one that has been imported from the bitfield community standards repo on github. Any protocol has specific arguments, typically at least the name of the column containing the variable values (x). To make this function as general as possible, all of these arguments are specified via the ... argument of bf_map. Internal protocols are:

  • na (x): test whether a variable contains NA-values (boolean).

  • nan (x): test whether a variable contains NaN-values (boolean).

  • inf (x): test whether a variable contains Inf-values (boolean).

  • identical (x, y): element-wise test whether values are identical across two variables (boolean).

  • range (x, min, max): test whether the values are within a given range (boolean).

  • matches (x, set): test whether the values match a given set (boolean).

  • grepl (x, pattern): test whether the values match a given pattern (boolean).

  • case (...): test whether values are part of given cases (enumeration).

  • nChar (x): count the number of characters of the values (unsigned integer).

  • nInt (x): count the number of integer digits of the values (unsigned integer).

  • nDec (x): count the decimal digits of the variable values (unsigned integer).

  • integer (x, ...): encode the integer values as bit-sequence (signed integer).

  • numeric (x, ...): encode the numeric value as floating-point bit-sequence (with an adapted precision) (floating-point).

Notes

The console output of various classes (such as tibble) shows decimals that are not present or rounds decimals that are present, even for ordinary numeric vectors. R stores numeric values internally as double-precision floating-point values (with 64 bits, where 52 bits encode the mantissa), which corresponds to a decimal precision of ~16 digits (log10(2^52)). Hence, if a bit flag doesn't seem to coincide with the values you see in the console, double check the values with sprintf("%16f", values). If you use a larger decimal precision, you'll see more digits, but those are not meaningful, as they result merely from the binary-to-decimal conversion (check out .makeEncoding for additional information.

When testing for cases, they are evaluate in the order they have been defined in. If an observation is part of two cases, it will thus have the value of the last case it matches. The encoding type of cases is given as enumeration, which means that the values can be either integer or factor. Both are handled as if they were integers internally, so even though an enumeration data type could in principle also be a character, this is possible within the scope of this package. Bitflag protocols that extend the case protocol must thus always result in integer values.

Examples

opr <- "identical"

# identify which arguments need to be given to call a test ...
formalArgs(bf_pcl[[opr]]$test)
#> [1] "x" "y"

# put the test together
bf_map(protocol = opr, data = bf_tbl, x = x, y = y, na.val = FALSE)
#>   width 1
#>   flags 1  -
#> 
#>   pos encoding type      col
#>   1   0.0.1/0  identical x-y

# some other examples of ...
# boolean encoding
bf_map(protocol = "matches", data = bf_tbl, x = commodity, set = c("soybean", "honey"))
#>   width 1
#>   flags 1  -
#> 
#>   pos encoding type    col
#>   1   0.0.1/0  matches commodity-c-soybean-honey
bf_map(protocol = "range", data = bf_tbl, x = yield, min = 10.4, max = 11)
#>   width 1
#>   flags 1  -
#> 
#>   pos encoding type  col
#>   1   0.0.1/0  range yield-10.4-11

# enumeration encoding
bf_map(protocol = "case", data = bf_tbl,
        yield >= 11, yield < 11 & yield > 9, yield < 9 & commodity == "maize")
#>   width 2
#>   flags 1  --
#> 
#>   pos encoding type  col
#>   1   0.0.2/0  case1 yield-commodity

# integer encoding
bf_map(protocol = "integer", data = bf_tbl, x = as.integer(year), na.val = 0L)
#> Warning: NAs introduced by coercion
#>   width 11
#>   flags 1  -----------
#> 
#>   pos encoding type    col
#>   1   0.0.11/0 integer as.integer-year

# floating-point encoding
bf_map(protocol = "numeric", data = bf_tbl, x = yield, decimals = 2)
#>   width 12
#>   flags 1  ------------
#> 
#>   pos encoding type    col
#>   1   0.1.11/0 numeric yield