The Language of Technical Computing
The Statistics Toolbox is a collection of tools built on the MATLABĀ® numeric computing environment. The toolbox supports a wide range of common statistical tasks, from random number generation, to curve fitting, to design of experiments and statistical process control.
The probability density function (pdf) has a different meaning depending on whether the distribution is discrete or continuous.
For discrete distributions, the pdf is the probability of observing a particular outcome. Suppose the random variable $X$ takes values $x_k\ (k=1,2,\ldots)$, and the probability of each value of taken is $P\{X=x_k\}=p_k,\ (k=1,2,\ldots)$, then we have
\[ p_k\geq 0,\quad k=1,2,\ldots \] \[ \sum_{k=1}^{+\infty}p_k=1 \]
Unlike discrete distributions, the pdf of a continuous distribution at a value is not the probability of observing that value. For continuous distributions the probability of observing any particular value is zero. To get probabilities you must integrate the pdf over an interval of interest.
A pdf has following theoretical properties:
The binomial distribution models the total number of successes in repeated trials from an infinite population under the following conditions:
James Bernoulli derived the binomial distribution in 1713 (Ars Conjectandi). Earlier, Blaise Pascal had considered the special case where p = 1/2.
The binomial pdf is \[ f(x)=\begin{cases} C_n^x p^x(1-p)^{n-x}& x=0,1,\ldots,n\\ 0& \text{otherwise} \end{cases} \] where \[ C_n^x=\frac{n!}{x!(n-x)!} \]
binopdf(k,n,p)% pdf('bino',k,n,p)
where $k$ is the number of repeated trials. $n$ is the total number of trials
There are a large number of electronic tubes. And 10% of them have been damaged. We select 20 tubes randomly to form a circuit. Please find the probability of this circuit to work properly (ie, all the selected 20 tubes are good).
>> binopdf(20,20,0.9) ans = 0.1216
A Quality Assurance inspector tests 200 circuit boards a day. If 2% of the boards have defects, what is the probability that the inspector will find no defective boards on any given day?
>> binopdf(0,200,0.02) ans = 0.0176
What is the most likely number of defective boards the inspector will find?
>> defects=0:200; >> y=binopdf(defects,200,.02) y = Columns 1 through 9 0.0176 0.0718 0.1458 0.1963 0.1973 0.1579 0.1047 0.0592 0.0292 Columns 10 through 18 0.0127 0.0049 0.0017 0.0006 0.0002 0.0000 0.0000 0.0000 0.0000 Columns 19 through 27 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 28 through 36 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 37 through 45 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 46 through 54 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 55 through 63 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 64 through 72 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 73 through 81 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 82 through 90 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 91 through 99 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 100 through 108 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 109 through 117 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 118 through 126 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 127 through 135 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 136 through 144 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 145 through 153 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 154 through 162 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 163 through 171 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 172 through 180 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 181 through 189 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Columns 190 through 198 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0 0 Columns 199 through 201 0 0 0 >> [x,i]=max(y) x = 0.1973 i = 5 >> defects(i) ans = 4
>> x=0:10; >> y=binopdf(x,10,0.5) y = Columns 1 through 9 0.0010 0.0098 0.0439 0.1172 0.2051 0.2461 0.2051 0.1172 0.0439 Columns 10 through 11 0.0098 0.0010 >> plot(x,y,'+')
The Poisson distribution is appropriate for applications that involve counting the number of times a random event occurs in a given amount of time, distance, area, etc.
The Poisson pdf is \[ f(x)=\begin{cases} \frac{\lambda^x e^{-\lambda}}{x!}& x=0,1,2,\ldots\\ 0 & \text{otherwise} \end{cases} \]
Y = poisspdf(k,$\lambda$)
where $k$ is the number of times a random event occurs within some interval, $\lambda$ is the mean value of them.
X and lambda can be vectors, matrices, or multidimensional arrays that all have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in lambda must all be positive.
Airline booking office received 36 calls per hour, please find the probability of receiving two calls within 5 minutes.
>> poisspdf(2,3) % here the office will receive 36/(60/5)=3 calls very 5 minutes. ans = 0.2240
Like the chi-square distribution, the exponential distribution is a special case of the gamma distribution (obtained by setting a = 1)
If the pdf of a random variable $X$ is \[ f(x)=\begin{cases} \lambda e^{-\lambda x}, & x > 0\\ 0, & \text{otherwise}, \end{cases} \] where $\lambda > 0$, then $X$ is exponentially distributed.
Or $\lambda=\frac{1}{\mu}$, then \[ f(x)=\begin{cases} \frac{1}{\mu}e^{-\frac{x}{\mu}}, & x > 0\\ 0, & \text{otherwise}, \end{cases} \] where $\mu > 0$.
The distribution function is \[ F(x)=\begin{cases} 1-e^{-\frac{x}{\mu}},& x > 0 \\ 0, & \text{otherwise}. \end{cases} \]
Y = exppdf(X,$\mu$) returns the pdf of the exponential distribution with mean parameter $\mu$, evaluated at the values in $X$. $X$ and $\mu$ can be vectors, matrices, or multidimensional arrays that have the same size. A scalar input is expanded to a constant array with the same dimensions as the other input. The parameters in $\mu$ must be positive.
That is \[ \text{exppdf}(x,\mu)=\begin{cases} \frac{1}{\mu}e^{-\frac{x}{\mu}}, & x > 0\\ 0, & \text{otherwise}, \end{cases} \]
The lifetime $X$ of some kind electronic component is exponentially distributed. The parameter $\mu=1000$. Consider there are three such electronic components. Find the probability of at least one of them has been damaged.
Since the distribution function is \[ F(x)=\begin{cases} 1-e^{-\frac{x}{\mu}},& x > 0 \\ 0, & \text{otherwise}. \end{cases} \]
\[ P\{X > 1000\}=1-P\{X\leq 1000\}=1-F(1000)=e^{-1} \]
The lifetimes of these electronic components are independent. So if $Y$ denotes the numbers which are damaged, then $Y\sim b(3, 1-e^{-1})$.
Thus, \[ P\{Y\geq 1\}=1-P\{Y=0\}=1-C_3^0(1-e^{-1})^0 (e^{-1})^3=1-e^{-3}. \]
The lifetime $X$ of some kind electronic component is exponentially distributed. The parameter $\mu=50$ (It's lifetime is 50 hours). Find the probability of such electroinc component cannot work properly just at the time of 25 hour and the probability of it breaks down during the first 25 hour.
>> exppdf(25,50) % probability of break down at time of 25 hour ans = 0.0121 >> expcdf(25,50) % probability of break down during the first 25 hours ans = 0.3935
The normal distribution is a two parameter family of curves. The first parameter, $\mu$, is the mean. The second, $\sigma$, is the standard deviation. The standard normal distribution (written $\Phi(x)$) sets $\mu$ to 0 and $\sigma$ to 1.
The usual justification for using the normal distribution for modeling is the Central Limit Theorem, which states (roughly) that the sum of independent samples from any distribution with finite mean and variance converges to the normal distribution as the sample size goes to infinity.
\[ f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
Find the probability density at point $0.6578$ of random variable $X$ which obey normal distribution N(0,1).
>> normpdf(0.6578,0,1) ans = 0.3213
To generate uniformly distributed random numbers in interval $[0,1]$, we can use command
We also can use
>> R=rand(3,4) R = 0.6324 0.5469 0.1576 0.4854 0.0975 0.9575 0.9706 0.8003 0.2785 0.9649 0.9572 0.1419 >> R=unifrnd(5,15,3,4) R = 9.2176 14.5949 13.4913 12.5774 14.1574 11.5574 14.3399 12.4313 12.9221 5.3571 11.7874 8.9223
>> help unifrnd unifrnd Random arrays from continuous uniform distribution. R = unifrnd(A,B) returns an array of random numbers chosen from the continuous uniform distribution on the interval from A to B. The size of R is the common size of A and B if both are arrays. If either parameter is a scalar, the size of R is the size of the other parameter.
Try command
Try the following commands.
Function | Usage |
---|---|
>> random('norm',2,0.3,3,4) ans = 2.0882 1.6559 1.1167 1.7735 1.7638 1.6793 2.4315 2.4111 2.2665 1.7572 2.0976 1.4865 >> help random% try it
For vectors,
For matrices,
>> A=[1 2 3; 4 5 2; 3 7 0] A = 1 2 3 4 5 2 3 7 0 >> sort(A) ans = 1 2 0 3 5 2 4 7 3 >> [Y,I]=sort(A) Y = 1 2 0 3 5 2 4 7 3 I = 1 1 3 3 2 2 2 3 1
>> A=[1 2 3; 4 5 2; 3 7 0] A = 1 2 3 4 5 2 3 7 0 >> sortrows(A) ans = 1 2 3 3 7 0 4 5 2 >> sortrows(A,1) ans = 1 2 3 3 7 0 4 5 2 >> sortrows(A,3) ans = 3 7 0 4 5 2 1 2 3 >> sortrows(A,[3 2]) ans = 3 7 0 4 5 2 1 2 3 >> sortrows(A,[2 3]) ans = 1 2 3 4 5 2 3 7 0
>> data = [1 2 3 4 50]; >> mean(data) ans = 12 >> x=[174.5 165 180.6 174.5 179 163 175.3 190 174 177.9] x = Columns 1 through 9 174.5000 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 Column 10 177.9000 >> mean(x) ans = 175.3800
>> data=[1 2 3 4 5] data = 1 2 3 4 5 >> median(data) ans = 3 >> data=[1 2 3 4 5 6] data = 1 2 3 4 5 6 >> median(data) ans = 3.5000 >> data=[3 2 3 4 9 19] data = 3 2 3 4 9 19 >> median(data) ans = 3.5000
\[ \text{var}(X)=s^2=\frac{1}{n-1}\sum_{i=1}^{n}(x_i-\bar{X})^2. \]
>> X=[165 180.6 174.5 179 163 175.3 190 174 177.9 160] X = Columns 1 through 9 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 177.9000 Column 10 160.0000 >> var(X) ans = 82.1846
The squared root of the variance is called the standard deviation.
>> X X = Columns 1 through 9 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 177.9000 Column 10 160.0000 >> X=[165 180.6 174.5 179 163 175.3 190 174 177.9 160] X = Columns 1 through 9 165.0000 180.6000 174.5000 179.0000 163.0000 175.3000 190.0000 174.0000 177.9000 Column 10 160.0000 >> DX=var(X,1) DX = 73.9661 >> sig=std(X,1) sig = 8.6004 >> isequal(sig^2,DX) ans = 0 >> sig^2 ans = 73.9661 >> DX1=var(X) DX1 = 82.1846 >> sig1=std(X) sig1 = 9.0656 >> sig1^2 ans = 82.1846 >> isequal(sig1^2,DX1) ans = 1 >> isequal(sig^2,DX) ans = 0 >> format long >> sig^2 ans = 73.966100000000026 >> DX DX = 73.966100000000012 >> sig1^2 ans = 82.184555555555562 >> DX1 DX1 = 82.184555555555562
\[ M=(\prod_{i=1}^{n}x_i)^{\frac{1}{n}} \]
>> A=[1 2 3 4] A = 1 2 3 4 >> M=geomean(A) M = 2.2134 >> B=[1 2 3 4;2 3 4 9; 2 9 0 5] B = 1 2 3 4 2 3 4 9 2 9 0 5 >> M2=geomean(B) M2 = 1.5874 3.7798 0 5.6462 >> %Let's test it >> T=[1 2 2] T = 1 2 2 >> geomean(T) ans = 1.5874
\[ M=\frac{n}{\sum_{i=1}^{n}\frac{1}{x_i}} \]
The arithmetic mean is greater than or equal to the harmonic mean
>> A=[1 2 4 6; 3 4 5 7; 8 9 6 0; 4 6 8 1] A = 1 2 4 6 3 4 5 7 8 9 6 0 4 6 8 1 >> M1=harmmean(A) M1 = 2.3415 3.8919 5.3933 0 >> Average=mean(A) Average = 4.0000 5.2500 5.7500 3.5000
range(X) returns the difference between the maximum and the minimum of a sample.
>> A=[1 2 3; 2 8 9; 3 6 2] A = 1 2 3 2 8 9 3 6 2 >> Y=range(A) Y = 2 6 7
Vector or matrix inputs for A and B must have the same size, which is also the size of M and V.
A scalar input for A or B is expanded to a constant matrix with the same dimensions as the other input.
The mean of the continuous uniform distribution with parameters a and b is $(a + b)/2$, and the variance is $(a-b)^2/12$.
>> a=1:6 a = 1 2 3 4 5 6 >> b=2.*a b = 2 4 6 8 10 12 >> [M,V]=unifstat(a,b) M = 1.5000 3.0000 4.5000 6.0000 7.5000 9.0000 V = 0.0833 0.3333 0.7500 1.3333 2.0833 3.0000 >> 1/12 ans = 0.0833
$\mu$ and $\sigma$ can be vectors, matrices, or multidimensional arrays that all have the same size, which is also the size of M and V. A scalar input for $\mu$ or $\sigma$ is expanded to a constant array with the same dimensions as the other input.
The mean of the normal distribution with parameters $\mu$ and $\sigma$ is $\mu$, and the variance is $\sigma^2$
>> n=1:5; >> A=n'*n A = 1 2 3 4 5 2 4 6 8 10 3 6 9 12 15 4 8 12 16 20 5 10 15 20 25 >> [M,V]=normstat(A,A) M = 1 2 3 4 5 2 4 6 8 10 3 6 9 12 15 4 8 12 16 20 5 10 15 20 25 V = 1 4 9 16 25 4 16 36 64 100 9 36 81 144 225 16 64 144 256 400 25 100 225 400 625