﻿

Accelerating the pace of engineering and science

• 評価版
• 製品アップデート

LinearMixedModel class

Linear mixed-effects model class

Description

A LinearMixedModel object represents a model of a response variable with fixed and random effects. It comprises data, a model description, fitted coefficients, covariance parameters, design matrices, residuals, residual plots, and other diagnostic information for a linear mixed-effects model. You can predict model responses with the predict function and generate random data at new design points using the random function.

Construction

You can fit a linear mixed-effects model using fitlme(ds,formula) if your data is in a dataset array. Alternatively, if your model is not easily described using a formula, you can create matrices to define the fixed and random effects, and fit the model using fitlmematrix(X,y,Z,G).

Input Arguments

expand all

Input data, which includes the response variable, predictor variables, and grouping variables, specified as a dataset array. The predictor variables can be continuous or grouping variables (see Grouping Variables). You must specify the model for the variables using formula.

Data Types: single | double | char | cell

formula — Formula for model specificationstring of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'

Formula for model specification, specified as a string of the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'. For a full description, see Formula.

Example: 'y ~ treatment +(1|block)'

X — Fixed-effects design matrixn-by-p matrix

Fixed-effects design matrix, specified as an n-by-p matrix, where n is the number of observations, and p is the number of fixed-effects predictor variables. Each row of X corresponds to one observation, and each column of X corresponds to one variable.

Data Types: single | double

y — Response valuesn-by-1 vector

Response values, specified as an n-by-1 vector, where n is the number of observations.

Data Types: single | double

Z — Random-effects designn-by-q matrix | cell array of R n-by-q(r) matrices, r = 1, 2, ..., R

Random-effects design, specified as either of the following.

• If there is one random-effects term in the model, then Z must be an n-by-q matrix, where n is the number of observations and q is the number of variables in the random-effects term.

• If there are R random-effects terms, then Z must be a cell array of length R. Each cell of Z contains an n-by-q(r) design matrix Z{r}, r = 1, 2, ..., R, corresponding to each random-effects term. Here, q(r) is the number of random effects term in the rth random effects design matrix, Z{r}.

Data Types: single | double | cell

G — Grouping variable or variablesn-by-1 vector | cell array of R n-by-1 vectors

Grouping variable or variables, specified as either of the following.

• If there is one random-effects term, then G must be an n-by-1 vector corresponding to a single grouping variable with M levels or groups.

G can be a categorical vector, numeric vector, character array, or cell array of strings.

• If there are multiple random-effects terms, then G must be a cell array of length R. Each cell of G contains a grouping variable G{r}, r = 1, 2, ..., R, with M(r) levels.

G{r} can be a categorical vector, numeric vector, character array, or cell array of strings.

Data Types: single | double | char | cell

Properties

Coefficients

Fixed-effects coefficient estimates and related statistics, stored as a dataset array containing the following fields.

 Name Name of the term. Estimate Estimated value of the coefficient. SE Standard error of the coefficient. tStat t-statistics for testing the null hypothesis that the coefficient is equal to zero. DF Degrees of freedom for the t-test. Method to compute DF is specified by the 'DFMethod' name-value pair argument. Coefficients always uses the 'Residual' method for 'DFMethod'. pValue p-value for the t-test. Lower Lower limit of the confidence interval for coefficient. Coefficients always uses the 95% confidence level, i.e.'alpha' is 0.05. Upper Upper limit of confidence interval for coefficient. Coefficients always uses the 95% confidence level, i.e.'alpha' is 0.05.

You can change 'DFMethod' and 'alpha' while computing confidence intervals for or testing hypotheses involving fixed- and random-effects, using the coefCI and coefTest methods.

CoefficientCovariance

Covariance of the estimated fixed-effects coefficients of the linear mixed-effects model, stored as a p-by-p matrix, where p is the number of fixed-effects coefficients.

You can display the covariance parameters associated with the random effects using the covarianceParameters method.

CoefficientNames

1-by-p cell array of strings containing the names of the fixed-effects coefficients of a linear mixed-effects model.

DFE

Residual degrees of freedom, stored as a positive integer value. DFE = np, where n is the number of observations, and p is the number of fixed-effects coefficients.

This corresponds to the 'Residual' method of calculating degrees of freedom in the fixedEffects and randomEffects methods.

FitMethod

Method used to fit the linear mixed-effects model, stored as either of the following strings.

• ML, if the fitting method is maximum likelihood

• REML, if the fitting method is restricted maximum likelihood

Formula

Specification of the fixed-effects terms, random-effects terms, and grouping variables that define the linear mixed-effects model, stored as an object.

For more information on how to specify the model to fit using a formula, see Formula.

LogLikelihood

Maximized log likelihood or maximized restricted log likelihood of the fitted linear mixed-effects model depending on the fitting method you choose, stored as a scalar value.

ModelCriterion

Model criterion that can be used to compare fitted linear mixed-effects models, stored as a dataset with the following columns.

 AIC Akaike Information Criterion BIC Bayesian Information Criterion Loglikelihood Log likelihood value of the model Deviance –2 times the log likelihood of the model

If n is the number of observations used in fitting the model, and p is the number of fixed-effects coefficients, then for calculating AIC and BIC,

• The total number of parameters is nc + p + 1, where nc is the total number of parameters in the random-effects covariance excluding the residual variance

• The effective number of observations is

• n, when the fitting method is maximum likelihood (ML)

• np, when the fitting method is restricted maximum likelihood (REML)

MSE

ML or REML estimate, based on the fitting method used for estimating σ2, stored as a positive scalar value. σ2 is the residual variance or variance of the observation error term of the linear mixed-effects model.

NumCoefficients

Number of fixed-effects coefficients in the fitted linear mixed-effects model, stored as a positive integer value.

NumEstimatedCoefficients

Number of estimated fixed-effects coefficients in the fitted linear mixed-effects model, stored as a positive integer value.

NumObservations

Number of observations used in the fit, stored as a positive integer value. This is the number of rows in the dataset array, or the design matrices minus the excluded rows or rows with NaN values.

NumPredictors

Number of variables used as predictors in the linear mixed-effects model, stored as a positive integer value.

NumVariables

Total number of variables including the response and predictors, stored as a positive integer value.

• If the sample data is in a dataset array ds, NumVariables is the total number of variables in ds including the response variable.

• If the fit is based on matrix input, NumVariables is the total number of columns in the predictor matrix or matrices, and response vector.

NumVariables includes variables, if there are any, that are not used as predictors or as the response.

ObservationInfo

Information about the observations used in the fit, stored as a dataset array.

ObservationInfo has one row for each observation and the following four columns.

 Weights The value of the weighted variable for that observation. Default value is 1. Excluded true, if the observation was excluded from the fit using the 'Exclude' name-value pair argument, false, otherwise. 1 stands for true and 0 stands for false. Missing true, if the observation was excluded from the fit because any response or predictor value is missing, false, otherwise. Missing values include NaN for numeric variables, empty cells for cell arrays, blank rows for character arrays, and the value for categorical arrays. Subset true, if the observation was used in the fit, false, if it was not used because it is missing or excluded.

ObservationNames

Names of observations used in the fit, stored as a cell array of strings.

• If the data is in a dataset array, ds, containing observation names, ObservationNames has those names.

• If the data is provided in matrices or a dataset array without observation names, then ObservationNames is an empty cell array.

PredictorNames

Names of the variables that you use as predictors in the fit, stored as a cell array of strings that has the same length as NumPredictors.

ResponseName

Name of the variable used as the response variable in the fit, stored as a character string.

Rsquared

Proportion of variability in the response explained by the fitted model, stored as a structure. It is the multiple correlation coefficient or R-squared. Rsquared has two fields.

 Ordinary R-squared value, stored as a scalar value in a structure. Rsquared.Ordinary = 1 – SSE./SST Adjusted R-squared value adjusted for the number of fixed-effects coefficients, stored as a scalar value in a structure. Rsquared.Adjusted = 1 – (SSE./SST)*(DFT./DFE), where DFE = n – p, DFT = n – 1, and n is the total number of observations, p is the number of fixed-effects coefficients.

SSE

Error sum of squares, that is, sum of the squared conditional residuals, stored as a positive scalar value.

SSE = sum((y – F).^2), where y is the response vector, and F is the fitted conditional response of the linear mixed-effects model. The conditional model has contributions from both fixed and random effects.

SSR

Regression sum of squares, that is, the sum of squares explained by the linear mixed-effects regression, stored as a positive scalar value. It is the sum of squared deviations of the conditional fitted values from their mean.

SSR = sum((F – mean(F)).^2), where F is the fitted conditional response of the linear mixed-effects model. The conditional model has contributions from both fixed and random effects.

SST

Total sum of squares, that is, the sum of the squared deviations of the observed response values from their mean, stored as a positive scalar value.

SST = sum((y – mean(y)).^2) = SSR + SSE, where y is the response vector.

Variables

Variables, stored as a dataset array.

• If the fit is based on a dataset array ds, then Variables is identical to ds.

• If the fit is based on matrix input, then Variables is a dataset array containing all the variables in the predictor matrix or matrices, and response variable.

VariableInfo

Information about the variables used in the fit, stored as a dataset array.

VariableInfo has one row for each variable and contains the following four columns.

 Class Class of the variable ('double', 'cell', 'nominal', and so on). Range Value range of the variable. For a numerical variable, it is a two-element vector of the form [min,max].For a cell or categorical variable, it is a cell or categorical array containing all unique values of the variable. InModel true, if the variable is a predictor in the fitted model.false, if the variable is not in the fitted model. IsCategorical true, if the variable has a type that is treated as a categorical predictor, such as cell, logical, or categorical, or if it is specified as categorical by the 'Categorical' name-value pair argument of the fit method.false, if it is a continuous predictor.

VariableNames

Names of the variables used in the fit, stored as a cell array of strings.

• If sample data is in a dataset array ds, VariableNames contains the names of the variables in ds.

• If sample data is in matrix format, then VariableInfo includes variable names you supply while fitting the model. If you do not supply the variable names, then VariableInfo contains the default names.

Methods

 anova Analysis of variance for linear mixed-effects model coefCI Confidence intervals for coefficients of linear mixed-effects model coefTest Hypothesis test on fixed and random effects of linear mixed-effects model compare Compare linear mixed-effects models covarianceParameters Extract covariance parameters of linear mixed-effects model designMatrix Fixed- and random-effects design matrices disp Display linear mixed-effects model fit Fit linear mixed-effects model using dataset arrays fitmatrix Fit linear mixed-effects model using design matrices fitted Fitted responses from a linear mixed-effects model fixedEffects Estimates of fixed effects and related statistics plotResiduals Plot residuals of linear mixed-effects model predict Predict response of linear mixed-effects model random Generate random responses from fitted linear mixed-effects model randomEffects Estimates of random effects and related statistics residuals Residuals of fitted linear mixed-effects model response Response vector of the linear mixed-effects model

Definitions

Formula

In general, a formula for model specification is a string of the form 'y ~ terms'. For the linear mixed-effects models, this formula is in the form 'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)', where fixed and random contain the fixed-effects and the random-effects terms. Suppose a dataset array ds contains the following:

• A response variable, y

• Predictor variables, Xj, which can be continuous or grouping variables

• Grouping variables, g1, g2, ..., gR,

where the grouping variables in Xj and gr can be categorical, logical, character arrays, or cell arrays of strings.

Then, in a formula of the form, 'y ~ fixed + (random1|g1) + ... + (randomR|gR)', the term fixed corresponds to a specification of the fixed-effects design matrix X, random1 is a specification of the random-effects design matrix Z1 corresponding to grouping variable g1, and similarly randomR is a specification of the random-effects design matrix ZR corresponding to grouping variable gR. You can express the fixed and random terms using Wilkinson notation.

Wilkinson notation describes the factors present in models. The notation relates to factors present in models, not to the multipliers (coefficients) of those factors.

Wilkinson NotationFactors in Standard Notation
1Constant (intercept) term
X^k, where k is a positive integerX, X2, ..., Xk
X1 + X2X1, X2
X1*X2X1, X2, X1.*X2 (elementwise multiplication of X1 and X2)
X1:X2X1.*X2 only
- X2Do not include X2
X1*X2 + X3X1, X2, X3, X1*X2
X1 + X2 + X3 + X1:X2X1, X2, X3, X1*X2
X1*X2*X3 - X1:X2:X3X1, X2, X3, X1*X2, X1*X3, X2*X3
X1*(X2 + X3)X1, X2, X3, X1*X2, X1*X3

Statistics Toolbox™ notation always includes a constant term unless you explicitly remove the term using -1. Here are some examples for linear mixed-effects model specification.

'y ~ X1 + X2'Fixed effects for the intercept, X1 and X2. This is equivalent to 'y ~ 1 + X1 + X2'.
'y ~ -1 + X1 + X2'Fixed effects for X1 and X2. The implicit intercept term is suppressed by including -1.
'y ~ 1 + (1 | g1)'Intercept plus random effect for each level of the grouping variable g1.
'y ~ X1 + (1 | g1)'Random intercept model with a fixed slope.
'y ~ X1 + (X1 | g1)'Random intercept and slope, with possible correlation between them.
'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' Independent random intercept and slope.
'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)'Random intercept model with independent main effects for g1 and g2, plus an independent interaction effect.

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB® documentation.

Examples

expand all

Random Intercept Model with Categorical Predictor

The flu dataset array has a Date variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the Center for Disease Control and Prevention, CDC).

To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into a tall array. The new dataset array, flu2, must have the response variable, FluRate, the nominal variable, Region, that shows which region each estimate is from, and the grouping variable Date.

flu2 = stack(flu,2:10,'NewDataVarName','FluRate',...
'IndVarName','Region');
flu2.Date = nominal(flu2.Date);

Fit a linear mixed-effects model with fixed effects for region and a random intercept that varies by Date.

Because region is a nominal variable, fitlme takes the first region, NE, as the reference and creates eight dummy variables representing the other eight regions. For example, I[MidAtl] is the dummy variable representing the region MidAtl. For details, see Dummy Indicator Variables.

The corresponding model is

where yim is the observation i for level m of grouping variable Date, βj, j = 0, 1, ..., 8, are the fixed-effects coefficients, b0m is the random effect for level m of the grouping variable Date, and εim is the observation error for observation i. The random effect has the prior distribution, b ~ N(0,σ2b) and the error term has the distribution, ε ~ N(0,σ2).

lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)')
Linear mixed-effects model fit by ML

Model information:
Number of observations             468
Fixed effects coefficients           9
Random effects coefficients         52
Covariance parameters                2

Formula:
FluRate ~ 1 + Region + (1|Date)

Model fit statistics:
AIC       BIC       LogLikelihood    Deviance
318.71    364.35    -148.36          296.71

Fixed effects coefficients (95% CIs):
Name                      Estimate    SE          tStat      DF     pValue        Lower        Upper
'(Intercept)'               1.2233    0.096678     12.654    459     1.085e-31       1.0334       1.4133
'Region_MidAtl'           0.010192    0.052221    0.19518    459       0.84534    -0.092429      0.11281
'Region_ENCentral'        0.051923    0.052221     0.9943    459        0.3206    -0.050698      0.15454
'Region_WNCentral'         0.23687    0.052221     4.5359    459    7.3324e-06      0.13424      0.33949
'Region_SAtl'             0.075481    0.052221     1.4454    459       0.14902     -0.02714       0.1781
'Region_ESCentral'         0.33917    0.052221      6.495    459    2.1623e-10      0.23655      0.44179
'Region_WSCentral'           0.069    0.052221     1.3213    459       0.18705    -0.033621      0.17162
'Region_Mtn'              0.046673    0.052221    0.89377    459       0.37191    -0.055948      0.14929
'Region_Pac'              -0.16013    0.052221    -3.0665    459     0.0022936     -0.26276    -0.057514

Random effects covariance parameters (95% CIs):
Group: Date (52 Levels)
Name1                Name2                Type         Estimate    Lower     Upper
'(Intercept)'        '(Intercept)'        'std'        0.6443      0.5297    0.78368

Group: Error
Name             Estimate    Lower      Upper
'Res Std'        0.26627     0.24878    0.285

The p-values 7.3324e-06 and 2.1623e-10 respectively show that the fixed effects of the flu rates in regions WNCentral and ESCentral are significantly different relative to the flu rates in region NE.

The confidence limits for the standard deviation of the random-effects term, σ2b, do not include 0 (0.5297, 0.78368), which indicates that the random-effects term is significant. You can also test the significance of the random-effects terms using the compare method.

The estimated value of an observation is the sum of the fixed effects and the random-effect value at the grouping variable level corresponding to that observation. For example, the estimated best linear unbiased predictor (BLUP) of the flu rate for region WNCentral in week 10/9/2005 is

This is the fitted conditional response, since it includes contribution to the estimate from both the fixed and random effects. You can compute this value as follows.

beta = fixedEffects(lme);
[~,~,STATS] = randomEffects(lme); % Compute the random-effects statistics (STATS)
STATS.Level = nominal(STATS.Level);
y_hat = beta(1) + beta(4) + STATS.Estimate(STATS.Level=='10/9/2005')
y_hat =

1.2884

You can simply display the fitted value using the fitted method.

F = fitted(lme);
F(flu2.Date == '10/9/2005' & flu2.Region == 'WNCentral')
ans =

1.2884

Compute the fitted marginal response for region WNCentral in week 10/9/2005.

F = fitted(lme,'Conditional',false);
F(flu2.Date == '10/9/2005' & flu2.Region == 'WNCentral')
ans =

1.4602

Linear Mixed-Effects Model with a Random Slope

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration, horsepower and cylinders, and potentially correlated random effect for intercept and acceleration grouped by model year. This model corresponds to

with the random-effects terms having the following prior distribution

where D(θ) is the covariance matrix.

First, prepare the design matrices for fitting the linear mixed-effects model.

X = [ones(406,1) Acceleration Horsepower];
Z = [ones(406,1) Acceleration];
Model_Year = nominal(Model_Year);
G = Model_Year;

Now, fit the model using fitlmematrix with the defined design matrices and grouping variables. Use the 'fminunc' optimization algorithm.

lme = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',....
{'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',...
{{'Intercept','Acceleration'}},'RandomEffectGroups',{'Model_Year'},...
'FitMethod','REML')
lme =

Linear mixed-effects model fit by REML

Model information:
Number of observations             392
Fixed effects coefficients           3
Random effects coefficients         26
Covariance parameters                4

Formula:
y ~ Intercept + Acceleration + Horsepower + (Intercept + Acceleration | Model_Year)

Model fit statistics:
AIC       BIC       LogLikelihood    Deviance
2202.9    2230.7    -1094.5          2188.9

Fixed effects coefficients (95% CIs):
Name                  Estimate    SE           tStat      DF     pValue        Lower       Upper
'Intercept'             50.064       2.3176     21.602    389    1.4185e-68      45.507       54.62
'Acceleration'        -0.57897      0.13843    -4.1825    389    3.5654e-05    -0.85112    -0.30681
'Horsepower'          -0.16958    0.0073242    -23.153    389    3.5289e-75    -0.18398    -0.15518

Random effects covariance parameters (95% CIs):
Group: Model_Year (13 Levels)
Name1                 Name2                 Type          Estimate    Lower       Upper
'Intercept'           'Intercept'           'std'            3.72       1.5215      9.0954
'Acceleration'        'Intercept'           'corr'        -0.8769     -0.98275    -0.33845
'Acceleration'        'Acceleration'        'std'          0.3593      0.19418     0.66483

Group: Error
Name             Estimate    Lower     Upper
'Res Std'        3.6913      3.4331    3.9688

The fixed effects coefficients display includes the estimate, standard errors (SE), and the 95% confidence interval limits (Lower and Upper). The p-values for (pValue) indicate that all three fixed-effects coefficients are significant.

The confidence intervals for the standard deviations and the correlation between the random effects for intercept and acceleration do not include zeros, hence they seem significant. Use the compare method to test for the random effects.

Display the covariance matrix of the estimated fixed-effects coefficients.

lme.CoefficientCovariance
ans =

5.3711   -0.2809   -0.0126
-0.2809    0.0192    0.0005
-0.0126    0.0005    0.0001

The diagonal elements show the variances of the fixed-effects coefficient estimates. For example, the variance of the estimate of the intercept is 5.3711. Note that the standard errors of the estimates are the square roots of the variances. For example, the standard error of the intercept is 2.3176, which is sqrt(5.3711).

The off-diagonal elements show the correlation between the fixed-effects coefficient estimates. For example, the correlation between the intercept and acceleration is –0.2809 and the correlation between acceleration and horsepower is 0.0005.

Display the coefficient of determination for the model.

lme.Rsquared
ans =

Ordinary: 0.7826