Overview
The package provides the CategoricalArray
type designed to hold categorical data (either unordered/nominal or ordered/ordinal) efficiently and conveniently. CategoricalArray{T}
holds values of type T
. The CategoricalArray{Union{T, Missing}}
variant can also contain missing values (represented as missing
, of the Missing
type). When indexed, CategoricalArray{T}
returns special CategoricalValue{T}
objects rather than the original values of type T
. CategoricalValue
is a simple wrapper around the categorical levels; it allows very efficient retrieval and comparison of actual values. See the PooledArrays.jl and IndirectArrays.jl packages for simpler array types storing data with a small number of values without wrapping them.
The main feature of CategoricalArray
is that it maintains a pool of the levels which can appear in the data. These levels are stored in a specific order: for unordered arrays, this order is only used for pretty printing (e.g. in cross tables or plots); for ordered arrays, it also allows comparing values using the <
and >
operators: the comparison is then based on the ordering of levels stored in the array. An ordered CategoricalValue
can be also compared with a value that when converted is equal to one of the levels of this CategoricalValue
. Whether an array is ordered can be defined either on construction via the ordered
argument, or at any time via the ordered!
function. The levels
function returns all the levels of CategoricalArray
, and the levels!
function can be used to set the levels and their order. Levels are also automatically extended when setting an array element to a level not encountered before. But they are never removed without manual intervention: use the droplevels!
function for this.