API Index
CategoricalArrays.CategoricalArray
— TypeCategoricalArray{T}(undef, dims::Dims; levels=nothing, ordered=false)
CategoricalArray{T}(undef, dims::Int...; levels=nothing, ordered=false)
Construct an uninitialized CategoricalArray
with levels of type T <: Union{AbstractChar, AbstractString, Number}
and dimensions dims
.
The levels
keyword argument can be a vector specifying possible values for the data (this is equivalent to but more efficient than calling levels!
on the resulting array). The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
CategoricalArray{T, N, R}(undef, dims::Dims; levels=nothing, ordered=false)
CategoricalArray{T, N, R}(undef, dims::Int...; levels=nothing, ordered=false)
Similar to definition above, but uses reference type R
instead of the default type (UInt32
).
CategoricalArray(A::AbstractArray; levels=nothing, ordered=false)
Construct a new CategoricalArray
with the values from A
and the same element type.
The levels
keyword argument can be a vector specifying possible values for the data (this is equivalent to but more efficient than calling levels!
on the resulting array). If levels
is omitted and the element type supports it, levels are sorted in ascending order; else, they are kept in their order of appearance in A
. The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
If A
is already a CategoricalArray
, its levels, orderedness and reference type are preserved unless explicitly overriden.
CategoricalArrays.CategoricalMatrix
— TypeCategoricalMatrix{T}(undef, m::Int, n::Int; levels=nothing, ordered=false)
Construct an uninitialized CategoricalMatrix
with levels of type T <: Union{AbstractChar, AbstractString, Number}
and dimensions dim
. The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
CategoricalMatrix{T, R}(undef, m::Int, n::Int; levels=nothing, ordered=false)
Similar to definition above, but uses reference type R
instead of the default type (UInt32
).
CategoricalMatrix(A::AbstractMatrix; levels=nothing, ordered=false)
Construct a CategoricalMatrix
with the values from A
and the same element type.
The levels
keyword argument can be a vector specifying possible values for the data (this is equivalent to but more efficient than calling levels!
on the resulting array). If levels
is omitted and the element type supports it, levels are sorted in ascending order; else, they are kept in their order of appearance in A
. The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
If A
is already a CategoricalMatrix
, its levels, orderedness and reference type are preserved unless explicitly overriden.
CategoricalArrays.CategoricalValue
— TypeCategoricalValue{T <: Union{AbstractChar, AbstractString, Number}, R <: Integer}
A wrapper around a value of type T
corresponding to a level in a CategoricalPool
.
CategoricalValue
objects are considered as equal to the value of type T
they wrap by ==
and isequal
. However, order comparisons like <
and isless
are only possible if isordered
is true
for the value's pool, and in that case the order of the pool's levels
is used rather than the standard ordering of values of type T
.
CategoricalArrays.CategoricalValue
— MethodCategoricalValue(value, source::Union{CategoricalValue, CategoricalArray})
Return a CategoricalValue
object wrapping value
and attached to the CategoricalPool
of source
.
CategoricalArrays.CategoricalVector
— TypeCategoricalVector{T}(undef, m::Int; levels=nothing, ordered=false)
Construct an uninitialized CategoricalVector
with levels of type T <: Union{AbstractChar, AbstractString, Number}
and dimensions dim
.
The levels
keyword argument can be a vector specifying possible values for the data (this is equivalent to but more efficient than calling levels!
on the resulting array). The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
CategoricalVector{T, R}(undef, m::Int; levels=nothing, ordered=false)
Similar to definition above, but uses reference type R
instead of the default type (UInt32
).
CategoricalVector(A::AbstractVector; levels=nothing, ordered=false)
Construct a CategoricalVector
with the values from A
and the same element type.
The levels
keyword argument can be a vector specifying possible values for the data (this is equivalent to but more efficient than calling levels!
on the resulting array). If levels
is omitted and the element type supports it, levels are sorted in ascending order; else, they are kept in their order of appearance in A
. The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
If A
is already a CategoricalVector
, its levels, orderedness and reference type are preserved unless explicitly overriden.
CategoricalArrays.categorical
— Methodcategorical(A::AbstractArray; levels=nothing, ordered=false, compress=false)
Construct a categorical array with the values from A
.
The levels
keyword argument can be a vector specifying possible values for the data (this is equivalent to but more efficient than calling levels!
on the resulting array). If levels
is omitted and the element type supports it, levels are sorted in ascending order; else, they are kept in their order of appearance in A
. The ordered
keyword argument determines whether the array values can be compared according to the ordering of levels or not (see isordered
).
If compress
is true
, the smallest reference type able to hold the number of unique values in A
will be used. While this will reduce memory use, passing this parameter will also introduce a type instability which can affect performance inside the function where the call is made. Therefore, use this option with caution (the one-argument version does not suffer from this problem).
categorical(A::CategoricalArray; compress=false, levels=nothing, ordered=false)
If A
is already a CategoricalArray
, its levels, orderedness and reference type are preserved unless explicitly overriden.
CategoricalArrays.compress
— Methodcompress(A::CategoricalArray)
Return a copy of categorical array A
using the smallest reference type able to hold the number of levels
of A
.
While this will reduce memory use, this function is type-unstable, which can affect performance inside the function where the call is made. Therefore, use it with caution.
CategoricalArrays.cut
— Methodcut(x::AbstractArray, breaks::AbstractVector;
labels::Union{AbstractVector,Function},
extend::Union{Bool,Missing}=false, allowempty::Bool=false)
Cut a numeric array into intervals at values breaks
and return an ordered CategoricalArray
indicating the interval into which each entry falls. Intervals are of the form [lower, upper)
, i.e. the lower bound is included and the upper bound is excluded, except if extend=true
the last interval, which is then closed on both ends, i.e. [lower, upper]
.
If x
accepts missing values (i.e. eltype(x) >: Missing
) the returned array will also accept them.
Keyword arguments
extend::Union{Bool, Missing}=false
: whenfalse
, an error is raised if some values inx
fall outside of the breaks; whentrue
, breaks are automatically added to include all values inx
, and the upper bound is included in the last interval; whenmissing
, values outside of the breaks generatemissing
entries.labels::Union{AbstractVector, Function}
: a vector of strings, characters or numbers giving the names to use for the intervals; or a functionf(from, to, i; leftclosed, rightclosed)
that generates the labels from the left and right interval boundaries and the group index. Defaults to"[from, to)"
(or"[from, to]"
for the rightmost interval ifextend == true
).allowempty::Bool=false
: whenfalse
, an error is raised if some breaks appear multiple times, generating empty intervals; whentrue
, duplicate breaks are allowed and the intervals they generate are kept as unused levels (but duplicate labels are not allowed).
Examples
julia> using CategoricalArrays
julia> cut(-1:0.5:1, [0, 1], extend=true)
5-element CategoricalArray{String,1,UInt32}:
"[-1.0, 0.0)"
"[-1.0, 0.0)"
"[0.0, 1.0]"
"[0.0, 1.0]"
"[0.0, 1.0]"
julia> cut(-1:0.5:1, 2)
5-element CategoricalArray{String,1,UInt32}:
"Q1: [-1.0, 0.0)"
"Q1: [-1.0, 0.0)"
"Q2: [0.0, 1.0]"
"Q2: [0.0, 1.0]"
"Q2: [0.0, 1.0]"
julia> cut(-1:0.5:1, 2, labels=["A", "B"])
5-element CategoricalArray{String,1,UInt32}:
"A"
"A"
"B"
"B"
"B"
julia> cut(-1:0.5:1, 2, labels=[-0.5, +0.5])
5-element CategoricalArray{Float64,1,UInt32}:
-0.5
-0.5
0.5
0.5
0.5
julia> fmt(from, to, i; leftclosed, rightclosed) = "grp $i ($from//$to)"
fmt (generic function with 1 method)
julia> cut(-1:0.5:1, 3, labels=fmt)
5-element CategoricalArray{String,1,UInt32}:
"grp 1 (-1.0//-0.3333333333333335)"
"grp 1 (-1.0//-0.3333333333333335)"
"grp 2 (-0.3333333333333335//0.33333333333333326)"
"grp 3 (0.33333333333333326//1.0)"
"grp 3 (0.33333333333333326//1.0)"
CategoricalArrays.cut
— Methodcut(x::AbstractArray, ngroups::Integer;
labels::Union{AbstractVector{<:AbstractString},Function},
allowempty::Bool=false)
Cut a numeric array into ngroups
quantiles, determined using quantile
.
If x
contains missing
values, they are automatically skipped when computing quantiles.
Keyword arguments
labels::Union{AbstractVector, Function}
: a vector of strings, characters or numbers giving the names to use for the intervals; or a functionf(from, to, i; leftclosed, rightclosed)
that generates the labels from the left and right interval boundaries and the group index. Defaults to"Qi: [from, to)"
(or"Qi: [from, to]"
for the rightmost interval).allowempty::Bool=false
: whenfalse
, an error is raised if some quantiles breakpoints are equal, generating empty intervals; whentrue
, duplicate breaks are allowed and the intervals they generate are kept as unused levels (but duplicate labels are not allowed).
CategoricalArrays.decompress
— Methoddecompress(A::CategoricalArray)
Return a copy of categorical array A
using the default reference type (UInt32). If A
is using a small reference type (such as UInt8
or UInt16
) the decompressed array will have room for more levels.
To avoid the need to call decompress, ensure compress
is not called when creating the categorical array.
CategoricalArrays.droplevels!
— Methoddroplevels!(A::CategoricalArray)
Drop levels which do not appear in categorical array A
(so that they will no longer be returned by levels
).
CategoricalArrays.isordered
— Methodisordered(A::CategoricalArray)
Test whether entries in A
can be compared using <
, >
and similar operators, using the ordering of levels.
CategoricalArrays.levelcode
— Methodlevelcode(x::CategoricalValue)
Get the code of categorical value x
, i.e. its index in the set of possible values returned by levels(x)
.
CategoricalArrays.levelcode
— Methodlevelcode(x::Missing)
Return missing
.
CategoricalArrays.levels!
— Methodlevels!(A::CategoricalArray, newlevels::Vector; allowmissing::Bool=false)
Set the levels categorical array A
. The order of appearance of levels will be respected by levels
, which may affect display of results in some operations; if A
is ordered (see isordered
), it will also be used for order comparisons using <
, >
and similar operators. Reordering levels will never affect the values of entries in the array.
If A
accepts missing values (i.e. eltype(A) >: Missing
) and allowmissing=true
, entries corresponding to omitted levels will be set to missing
. Else, newlevels
must include all levels which appear in the data.
CategoricalArrays.ordered!
— Methodordered!(A::CategoricalArray, ordered::Bool)
Set whether entries in A
can be compared using <
, >
and similar operators, using the ordering of levels. Return the modified A
.
CategoricalArrays.recode
— Functionrecode(a::AbstractArray[, default::Any], pairs::Pair...)
Return a copy of a
, replacing elements matching a key of pairs
with the corresponding value. The type of the array is chosen so that it can hold all recoded elements (but not necessarily original elements from a
).
For each Pair
in pairs
, if the element is equal to (according to isequal
) or in
the key (first item of the pair), then the corresponding value (second item) is used. If the element matches no key and default
is not provided or nothing
, it is copied as-is; if default
is specified, it is used in place of the original element. If an element matches more than one key, the first match is used.
recode(a::CategoricalArray[, default::Any], pairs::Pair...)
If a
is a CategoricalArray
then the ordering of resulting levels is determined by the order of passed pairs
and default
will be the last level if provided.
Examples
julia> using CategoricalArrays
julia> recode(1:10, 1=>100, 2:4=>0, [5; 9:10]=>-1)
10-element Vector{Int64}:
100
0
0
0
-1
6
7
8
-1
-1
recode(a::AbstractArray{>:Missing}[, default::Any], pairs::Pair...)
If a
contains missing values, they are never replaced with default
: use missing
in a pair to recode them. If that's not the case, the returned array will accept missing values.
Examples
julia> using CategoricalArrays
julia> recode(1:10, 1=>100, 2:4=>0, [5; 9:10]=>-1, 6=>missing)
10-element Vector{Union{Missing, Int64}}:
100
0
0
0
-1
missing
7
8
-1
-1
CategoricalArrays.recode!
— Functionrecode!(dest::AbstractArray, src::AbstractArray[, default::Any], pairs::Pair...)
Fill dest
with elements from src
, replacing those matching a key of pairs
with the corresponding value.
For each Pair
in pairs
, if the element is equal to (according to isequal
)) the key (first item of the pair) or to one of its entries if it is a collection, then the corresponding value (second item) is copied to dest
. If the element matches no key and default
is not provided or nothing
, it is copied as-is; if default
is specified, it is used in place of the original element. dest
and src
must be of the same length, but not necessarily of the same type. Elements of src
as well as values from pairs
will be convert
ed when possible on assignment. If an element matches more than one key, the first match is used.
recode!(dest::CategoricalArray, src::AbstractArray[, default::Any], pairs::Pair...)
If dest
is a CategoricalArray
then the ordering of resulting levels is determined by the order of passed pairs
and default
will be the last level if provided.
recode!(dest::AbstractArray, src::AbstractArray{>:Missing}[, default::Any], pairs::Pair...)
If src
contains missing values, they are never replaced with default
: use missing
in a pair to recode them.
CategoricalArrays.recode!
— Methodrecode!(a::AbstractArray[, default::Any], pairs::Pair...)
Convenience function for in-place recoding, equivalent to recode!(a, a, ...)
.
Examples
julia> using CategoricalArrays
julia> x = collect(1:10);
julia> recode!(x, 1=>100, 2:4=>0, [5; 9:10]=>-1);
julia> x
10-element Vector{Int64}:
100
0
0
0
-1
6
7
8
-1
-1
DataAPI.levels
— Methodlevels(x::CategoricalArray; skipmissing=true)
levels(x::CategoricalValue)
Return the levels of categorical array or value x
. This may include levels which do not actually appear in the data (see droplevels!
). missing
will be included only if it appears in the data and skipmissing=false
is passed.
The returned vector is an internal field of x
which must not be mutated as doing so would corrupt it.
DataAPI.unwrap
— Methodunwrap(x::CategoricalValue)
unwrap(x::Missing)
Get the value wrapped by categorical value x
. If x
is Missing
return missing
.