histcounts

Histogram bin counts

collapse all in page

Syntax

[N,edges] = histcounts(X)

[N,edges] = histcounts(X,nbins)

[N,edges] = histcounts(X,edges)

[N,edges,bin] = histcounts(___)

N = histcounts(C)

N = histcounts(C,Categories)

[N,Categories] = histcounts(___)

[___] = histcounts(___,Name,Value)

Description

example

[N,edges] = histcounts(X)partitions theXvalues into bins, and returns the count in each bin, as well as the bin edges. Thehistcountsfunction uses an automatic binning algorithm that returns bins with a uniform width, chosen to cover the range of elements inXand reveal the underlying shape of the distribution.

example

[N,edges] = histcounts(X,nbins)uses a number of bins specified by the scalar,nbins.

example

[N,edges] = histcounts(X,edges)sortsXinto bins with the bin edges specified by the vector,edges. The valueX(i)is in thekth bin ifedges(k)≤X(i)<edges(k+1). The last bin also includes the right bin edge, so that it containsX(i)ifedges(end-1)≤X(i)≤edges(end).

example

[N,edges,箱子] = histcounts(___)also returns an index array,箱子, using any of the previous syntaxes.箱子is an array of the same size asXwhose elements are the bin indices for the corresponding elements inX. The number of elements in thekth bin isnnz(bin==k), which is the same asN(k).

example

N= histcounts(C), whereCis a categorical array, returns a vector,N, that indicates the number of elements inCwhose value is equal to each ofC’s categories.Nhas one element for each category inC.

N= histcounts(C,Categories)counts only the elements inCwhose value is equal to the subset of categories specified byCategories.

example

[N,Categories] = histcounts(___)also returns the categories that correspond to each count inNusing either of the previous syntaxes for categorical arrays.

example

[___] = histcounts(___,Name,Value)uses additional options specified by one or moreName,Valuepair arguments using any of the input or output argument combinations in previous syntaxes. For example, you can specify'BinWidth'and a scalar to adjust the width of the bins for numeric data. For categorical data, you can specify'Normalization'and either'count','countdensity','probability','pdf','cumcount', or'cdf'.

Examples

collapse all

Bin Counts and Bin Edges

Open Live Script

Distribute 100 random values into bins.histcounts自动选择一个合适的宽度reveal the underlying distribution of the data.

X = randn(100,1); [N,edges] = histcounts(X)

N =1×72 17 28 32 16 3 2

edges =1×8-3 -2 -1 0 1 2 3 4

Specify Number of Bins

Open Live Script

Distribute 10 numbers into 6 equally spaced bins.

X = [2 3 5 7 11 13 17 19 23 29]; [N,edges] = histcounts(X,6)

N =1×62 2 2 2 1 1

edges =1×70 4.9000 9.8000 14.7000 19.6000 24.5000 29.4000

Specify Bin Edges

Open Live Script

Distribute 1,000 random numbers into bins. Define the bin edges with a vector, where the first element is the left edge of the first bin, and the last element is the right edge of the last bin.

X = randn(1000,1); edges = [-5 -4 -2 -1 -0.5 0 0.5 1 2 4 5]; N = histcounts(X,edges)

N =1×100 24 149 142 195 200 154 111 25 0

Normalized Bin Counts

Open Live Script

Distribute all of the prime numbers less than 100 into bins. Specify'Normalization'as'probability'to normalize the bin counts so thatsum(N)is1. That is, each bin count represents the probability that an observation falls within that bin.

X = primes(100); [N,edges] = histcounts(X,'Normalization','probability')

N =1×40.4000 0.2800 0.2800 0.0400

edges =1×50 30 60 90 120

Determine Bin Placement

Open Live Script

Distribute 100 random integers between -5 and 5 into bins, and specify'BinMethod'as'integers'to use unit-width bins centered on integers. Specify a third output forhistcountsto return a vector representing the bin indices of the data.

X = randi([-5,5],100,1); [N,edges,bin] = histcounts(X,'BinMethod','integers');

Find the bin count for the third bin by counting the occurrences of the number3in the bin index vector,箱子. The result is the same asN(3).

count = nnz(bin==3)

count = 8

Categorical Bin Counts

Open Live Script

Create a categorical vector that represents votes. The categories in the vector are'yes','no', or'undecided'.

A = [0 0 1 1 1 0 0 0 0 NaN NaN 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1]; C = categorical(A,[1 0 NaN],{'yes','no','undecided'})

C =1x27 categoricalColumns 1 through 9 no no yes yes yes no no no no Columns 10 through 16 undecided undecided yes no no no yes Columns 17 through 25 no yes no yes no no no yes yes Columns 26 through 27 yes yes

Determine the number of elements that fall into each category.

[N,Categories] = histcounts(C)

N =1×311 14 2

类别=1x3 cell{'yes'} {'no'} {'undecided'}

Input Arguments

collapse all

`X`—Data to distribute among bins
vector|matrix|multidimensional array

Data to distribute among bins, specified as a vector, matrix, or multidimensional array. IfXis not a vector, thenhistcountstreats it as a single column vector,X(:).

histcountsignores allNaNvalues. Similarly,histcountsignoresInfand-Infvalues unless the bin edges explicitly specifyInfor-Infas a bin edge.

`C`—Categorical data
categorical array

Categorical data, specified as a categorical array.histcountsignores undefined categorical values.

Data Types:categorical

`nbins`—Number of bins
positive integer

Number of bins, specified as a positive integer. If you do not specifynbins, thenhistcountsautomatically calculates how many bins to use based on the values inX.

Example:[N,edges] = histcounts(X,15)uses 15 bins.

`edges`—Bin edges
vector

Bin edges, specified as a vector.edges(1)is the left edge of the first bin, andedges(end)is the right edge of the last bin.

For datetime and duration data,edgesmust be a datetime or duration vector in monotonically increasing order.

`Categories`—Categories included in count
all categories(default) |string vector|cell vector of character vectors|`pattern`scalar|categorical vector

Categories included in count, specified as a string vector, cell vector of character vectors,patternscalar, or categorical vector. By default,histcountsuses a bin for each category in categorical arrayC. UseCategoriesto specify a unique subset of the categories instead.

Example:h = histcounts(C,["Large","Small"])counts only the categorical data in the categoriesLargeandSmall.

Example:h = histcounts(C,"Y" + wildcardPattern)counts categorical data in all the categories whose names begin with the letterY.

Data Types:string|cell|pattern|categorical

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, whereNameis the argument name andValueis the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

R2021a之前,用逗号来分隔每一个名字d value, and encloseNamein quotes.

Example:[N,edges] = histcounts(X,'Normalization','probability')normalizes the bin counts inN, such thatsum(N)is 1.

`BinLimits`—Bin limits
two-element vector

Bin limits, specified as a two-element vector,[bmin,bmax]. This option bins only the values inXthat fall betweenbminandbmaxinclusive; that is,X(X>=bmin & X<=bmax).

This option does not apply to categorical data.

Example:[N,edges] = histcounts(X,'BinLimits',[1,10])箱子s only the values inXthat are between1and10inclusive.

`BinMethod`—Binning algorithm
`'auto'`(default) |`'scott'`|`'fd'`|`'integers'`|`'sturges'`|`'sqrt'`| ...

Binning algorithm, specified as one of the values in this table.

Value	Description
`'auto'`	的default`'auto'`algorithm chooses a bin width to cover the data range and reveal the shape of the underlying distribution.
`'scott'`	Scott’s rule is optimal if the data is close to being normally distributed, but is also appropriate for most other distributions. It uses a bin width of`3.5std(X(:))numel(X)^(-1/3)`.
`'fd'`	的Freedman-Diaconis rule is less sensitive to outliers in the data, and may be more suitable for data with heavy-tailed distributions. It uses a bin width of`2IQR(X(:))numel(X)^(-1/3)`, where`IQR`is the interquartile range of`X`.
`'integers'`	的integer rule is useful with integer data, as it creates a bin for each integer. It uses a bin width of 1 and places bin edges halfway between integers. To prevent from accidentally creating too many bins, a limit of 65536 bins (2¹⁶这条规则可以创建)。如果数据range is greater than 65536, then wider bins are used instead. Note `'integers'`does not support datetime or duration data.
`'sturges'`	Sturges’ rule is a simple rule that is popular due to its simplicity. It chooses the number of bins to be`ceil(1 + log2(numel(X)))`.
`'sqrt'`	的Square Root rule is another simple rule widely used in other software packages. It chooses the number of bins to be`ceil(sqrt(numel(X)))`.

histcountsdoes not always choose the number of bins using these exact formulas. Sometimes the number of bins is adjusted slightly so that the bin edges fall on "nice" numbers.

For datetime data, the bin method can be one of these units of time:

`'second'`	`'month'`
`'minute'`	`'quarter'`
`'hour'`	`'year'`
`'day'`	`'decade'`
`'week'`	`'century'`

For duration data, the bin method can be one of these units of time:

`'second'`	`'day'`
`'minute'`	`'year'`
`'hour'`

If you specifyBinMethodwith datetime or duration data, thenhistcountscan use a maximum of 65,536 bins (or 2¹⁶). If the specified bin duration requires more bins, thenhistcountsuses a larger bin width corresponding to the maximum number of bins.

This option does not apply to categorical data.

Example:[N,edges] = histcounts(X,'BinMethod','integers')uses bins centered on integers.

`BinWidth`—Width of bins
scalar

Width of bins, specified as a scalar. If you specifyBinWidth, thenhistcountscan use a maximum of 65,536 bins (or2¹⁶). If the specified bin width requires more bins, thenhistcountsuses a larger bin width corresponding to the maximum number of bins.

For datetime and duration data, the value of'BinWidth'can be a scalar duration or calendar duration.

This option does not apply to categorical data.

Example:[N,edges] = histcounts(X,'BinWidth',5)uses bins with a width of 5.

`BinEdges`—Edges of bins
numeric vector

Edges of bins, specified as a numeric vector. The first vector element specifies the left edge of the first bin. The last element specifies the right edge of the last bin. If you do not specify the bin edges, thenhistcountsautomatically determines the location of the bin edges.

This option does not apply to categorical data.

`Normalization`—Type of normalization
`'count'`(default) |`'probability'`|`'countdensity'`|`'pdf'`|`'cumcount'`|`'cdf'`

Type of normalization, specified as one of the values in this table. For each bini:

$v_{i}$ is the bin value.
$c_{i}$ is the number of elements in the bin.
$w_{i}$ is the width of the bin.
$N$ is the number of elements in the input data. This value can be greater than the binned data if the data containsNaN,NaT, orvalues, or if some of the data lies outside the bin limits.

Value	Bin Values	Notes
`'count'`(default)	$v_{i} = c_{i}$	Count or frequency of observations. Sum of bin values is less than or equal to`numel(X)`. The sum is less than`numel(X)`only when some of the input data is not included in the bins. For categorical data, sum of bin values is either`numel(X)`or`sum(ismember(X(:),Categories))`.
`'countdensity'`	$v_{i} = \frac{c_{i}}{w_{i}}$	Count or frequency scaled by width of bin. For categorical data, this the same as`'count'`. Note `'countdensity'`does not support datetime or duration data.
`'cumcount'`	$v_{i} = \sum_{j = 1}^{i} c_{j}$	Cumulative count. Each bin value is the cumulative number of observations in that bin and all previous bins. 的value of the last bin is less than or equal to`numel(X)`. For categorical data, the value of the last bin is less than or equal to`numel(X)`or`sum(ismember(X(:),Categories))`.
`'probability'`	$v_{i} = \frac{c_{i}}{N}$	Relative probability. 的sum of the bin values is less than or equal to`1`.
`'pdf'`	$v_{i} = \frac{c_{i}}{N \cdot w_{i}}$	Probability density function estimate. For categorical data, this is the same as`'probability'`. Note `'pdf'`does not support datetime or duration data.
`'cdf'`	$v_{i} = \sum_{j = 1}^{i} \frac{c_{j}}{N}$	Cumulative density function estimate. `N(end)`is less than or equal to`1`.

Example:[N,edges] = histcounts(X,'Normalization','pdf')箱子s the data using the probability density function estimate.

`NumBins`—Number of bins
positive integer

Number of bins, specified as a positive integer. If you do not specifyNumBins, thenhistcountsautomatically calculates how many bins to use based on the input data.

This option does not apply to categorical data.

Output Arguments

collapse all

`N`— Bin counts
row vector

Bin counts, returned as a row vector.

`edges`— Bin edges
vector

Bin edges, returned as a vector.edges(1)is the left edge of the first bin, andedges(end)is the right edge of the last bin.

`箱子`— Bin indices
数组

Bin indices, returned as an array of the same size asX. Each element in箱子describes which numbered bin contains the corresponding element inX.

A value of0in箱子indicates an element which does not belong to any of the bins (for example, aNaNvalue).

`Categories`— Categories included in count
cell vector of character vectors

Categories included in count, returned as a cell vector of character vectors.Categoriescontains the categories inCthat correspond to each count inN.

Tips

的behavior ofhistcountsis similar to that of thediscretizefunction. Usehistcountsto find the number of elements in each bin. On the other hand, usediscretizeto find which bin each element belongs to (without counting).

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

Some input options are not supported. The allowed options are:
- 'BinWidth'
- 'BinLimits'
- 'Normalization'
- 'BinMethod'— The'auto'and'scott'本方法是相同的。的'fd'箱子method is not supported.
的Categoriesinput argument does not support pattern expressions.

For more information, seeTall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Code generation does not support sparse matrix inputs for this function.
If you do not supply bin edges, then code generation might require variable-size arrays and dynamic memory allocation.
的Categoriesinput argument does not support pattern expressions.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Usage notes and limitations:

Code generation does not support sparse matrix inputs for this function.
If you do not supply bin edges, then code generation might require variable-size arrays and dynamic memory allocation.
的Categoriesinput argument does not support pattern expressions.

Thread-Based Environment
Run code in the background using MATLAB®`backgroundPool`or accelerate code with Parallel Computing Toolbox™`ThreadPool`.

This function fully supports thread-based environments. For more information, seeRun MATLAB Functions in Thread-Based Environment.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

64-bit integers are not supported.

For more information, seeRun MATLAB Functions on a GPU(Parallel Computing Toolbox).

Version History

Introduced in R2014b

histcounts

Syntax

Description

Examples

Bin Counts and Bin Edges

Specify Number of Bins

Specify Bin Edges

Normalized Bin Counts

Determine Bin Placement

Categorical Bin Counts

Input Arguments

`X`—Data to distribute among bins
vector|matrix|multidimensional array

`C`—Categorical data
categorical array

`nbins`—Number of bins
positive integer

`edges`—Bin edges
vector

`Categories`—Categories included in count
all categories(default) |string vector|cell vector of character vectors|`pattern`scalar|categorical vector

Name-Value Arguments

`BinLimits`—Bin limits
two-element vector

`BinMethod`—Binning algorithm
`'auto'`(default) |`'scott'`|`'fd'`|`'integers'`|`'sturges'`|`'sqrt'`| ...

`BinWidth`—Width of bins
scalar

`BinEdges`—Edges of bins
numeric vector

`Normalization`—Type of normalization
`'count'`(default) |`'probability'`|`'countdensity'`|`'pdf'`|`'cumcount'`|`'cdf'`

`NumBins`—Number of bins
positive integer

Output Arguments

`N`— Bin counts
row vector

`edges`— Bin edges
vector

`箱子`— Bin indices
数组

`Categories`— Categories included in count
cell vector of character vectors

Tips

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment
Run code in the background using MATLAB®`backgroundPool`or accelerate code with Parallel Computing Toolbox™`ThreadPool`.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

histcounts

Syntax

Description

Examples

Bin Counts and Bin Edges

Specify Number of Bins

Specify Bin Edges

Normalized Bin Counts

Determine Bin Placement

Categorical Bin Counts

Input Arguments

X—Data to distribute among binsvector|matrix|multidimensional array

C—Categorical datacategorical array

nbins—Number of binspositive integer

edges—Bin edgesvector

Categories—Categories included in countall categories(default) |string vector|cell vector of character vectors|patternscalar|categorical vector

Name-Value Arguments

BinLimits—Bin limitstwo-element vector

BinMethod—Binning algorithm'auto'(default) |'scott'|'fd'|'integers'|'sturges'|'sqrt'| ...

BinWidth—Width of binsscalar

BinEdges—Edges of binsnumeric vector

Normalization—Type of normalization'count'(default) |'probability'|'countdensity'|'pdf'|'cumcount'|'cdf'

NumBins—Number of binspositive integer

Output Arguments

N— Bin countsrow vector

edges— Bin edgesvector

箱子— Bin indices数组

Categories— Categories included in countcell vector of character vectors

Tips

Extended Capabilities

Tall ArraysCalculate with arrays that have more rows than fit in memory.

C/C++ Code GenerationGenerate C and C++ code using MATLAB® Coder™.

GPU Code GenerationGenerate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based EnvironmentRun code in the background using MATLAB®backgroundPoolor accelerate code with Parallel Computing Toolbox™ThreadPool.

GPU ArraysAccelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

`X`—Data to distribute among bins
vector|matrix|multidimensional array

`C`—Categorical data
categorical array

`nbins`—Number of bins
positive integer

`edges`—Bin edges
vector

`Categories`—Categories included in count
all categories(default) |string vector|cell vector of character vectors|`pattern`scalar|categorical vector

`BinLimits`—Bin limits
two-element vector

`BinMethod`—Binning algorithm
`'auto'`(default) |`'scott'`|`'fd'`|`'integers'`|`'sturges'`|`'sqrt'`| ...

`BinWidth`—Width of bins
scalar

`BinEdges`—Edges of bins
numeric vector

`Normalization`—Type of normalization
`'count'`(default) |`'probability'`|`'countdensity'`|`'pdf'`|`'cumcount'`|`'cdf'`

`NumBins`—Number of bins
positive integer

`N`— Bin counts
row vector

`edges`— Bin edges
vector

`箱子`— Bin indices
数组

`Categories`— Categories included in count
cell vector of character vectors

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment
Run code in the background using MATLAB®`backgroundPool`or accelerate code with Parallel Computing Toolbox™`ThreadPool`.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.