Desmat - interactions and contrastsDirect all comments to: John HendrickxClick here for a pdf version of this text. The latest version of desmat is available at SSC-IDEAS. It can be installed using
IntroductionThis describes desmat, a replacement for xi (STB-52, STB-54, STB-59). The latest version allows desmat to be used as a command prefix, like xi. Other changes include the use of "@" as a prefix to flag continuous variables in the desmat model specification and the option to write results to a tab-delimited ascii file. Desmat is used to generate a design matrix, i.e. a set of dummy variables based on categorical and/or continuous variables. Desmat can be used as a command prefix, in which case normal output is suppressed and results are presented using desrep. When desmat is used as a command by itself, the dummy variables _x_* can be used in any appropriate Stata procedure and the results presented using desrep. Desmat therefore serves the same purpose as xi, but allows different types of parameterizations than the indicator contrast (i.e. dummy variables with a fixed reference category). In addition, desmat allows the specification of higher order interaction effects and an easier specification of the reference category. Related programs included in the desmat package are showtrms, which produces a legend of the dummy variables generated, the model terms these pertain to, and the contrast used, and destest, which can be used to perform a Wald test on model terms. ExampleKnoke & Burke (1980: 23) present a four-way table of race by education by membership by vote turnout. Their loglinear model {VM}{VER}{ERM} could be specified as: |
desmat: glm pop vote*memb vote*educ*race educ*race*memb, link(log) family(poisson)
Desmat will produce the following output:
-------------------------------------------------------------------------------
Generalized Linear Models
-------------------------------------------------------------------------------
Dependent variable pop
Variance function: Poisson
Link function: Log
Optimization: ML: Newton-Raphson
Number of observations: 24
Deviance: 4.756
Deviance dispersion: 0.951
Log likelihood: -65.841
Model degrees of freedom: 18
Residual degrees of freedom: 5
AIC: 7.070
BIC: -55.627
Prob: 0.446
-------------------------------------------------------------------------------
nr Effect Coeff s.e.
-------------------------------------------------------------------------------
pop
vote
1 Voted -0.021 0.111
memb
2 One or More -0.536** 0.119
vote.memb
3 Voted.One or More 0.768** 0.120
educ
4 High School Graduate -0.488** 0.126
5 College -1.638** 0.170
vote.educ
6 Voted.High School Graduate 0.192 0.141
7 Voted.College 0.844** 0.164
race
8 Black -1.441** 0.191
vote.race
9 Voted.Black -0.070 0.244
educ.race
10 High School Graduate.Black -0.946* 0.384
11 College.Black -0.465 0.478
vote.educ.race
12 Voted.High School Graduate.Black 0.500 0.432
13 Voted.College.Black -0.723 0.432
educ.memb
14 High School Graduate.One or More 0.648** 0.138
15 College.One or More 1.397** 0.161
race.memb
16 Black.One or More -0.525* 0.253
educ.race.memb
17 High School Graduate.Black.One or More 0.170 0.410
18 College.Black.One or More 0.783 0.507
19 _cons 4.781** 0.085
-------------------------------------------------------------------------------
* p < .05
** p < .01
| Alternatively, desmat could be used as a
command by itself to generate a set of dummy variables, for use in a
subsequent Stata procedure.
This would produce the same output, supplemented by a report by desmat on the dummy variables it had generated and the output generated by glm itself. This effect can also be obtained by specifying the verbose when using desmat in command prefix mode. The program destest can be used after estimating a model to perform a Wald test on selected model terms. In destest, using an asterisk in model terms does not cause nested effects to be tested as well. To test these as well, they must be explicitly listed, or destest should be used without any arguments to test all model terms. |
. * test only the highest order terms . destest vote*memb vote*educ*race educ*race*memb ------------------------------------------------------------------------------- Term Wald chi2 df P > chi2 ------------------------------------------------------------------------------- vote.memb 41.146** 1 0.000 vote.educ.race 5.978 2 0.050 educ.race.memb 2.391 2 0.303 ------------------------------------------------------------------------------- * p < .05 ** p < .01
SyntaxLike xi, desmat can be used as either as a command or as a command prefix. When used as a command, desmat generates a set of dummy variables for use by subsequent Stata programs. When used as a command prefix, the model is estimated after the dummy variables are generated and the results are presented using desrep. Desmat command syntax: desmat model [, colinf defcon(contrast_specification) ] Desmat as a command prefix syntax: desmat: stata_procedure depvar model [using] [if] [in] [fweight pweight aweight iweight] [, verbose defcon(contrast_specification) desrep(desrep_options) procedure_options ] The modelThe model consists of one or more terms separated by spaces. A term can be a single variable, two or more variables joined by period(s) ".", or two or more variables joined by asterisk(s) "*". A period is used to specify an interaction effect as such, whereas an asterisk indicates hierarchical notation, in which both the interaction effect itself plus all possible nested interactions and main effects are included. For example, the term "vote*educ*race" is expanded to "vote educ vote.educ race vote.race educ.race vote.educ.race". Variables may be either string or numeric. All variables in the model will be treated as continuous using the pzat characteristic discussed below) or by specifying a contrast for the term (also discussed ). Alternatively, a variable can be prefixed by an @ to flag it as a . For example: desmat: regress brate @medage @medagesq region The variables medage and medage will be treated as continuous variables. The variable region will be treated as categorical and dummy variables will be generated using its first category as reference category. OptionsWhen desmat is used as a command prefix, if or in
options as well as weights may be specified in the usual manner and
will be passed on to the procedure in question. Any options besides verbose,
defcon and desrep will be passed on to the
procedure as well.
Options for command model onlyFor compatibility with earlier versions of desmat, a default
parameterization may be specified as an option rather than an argument
for the defcon option.
Options for command prefix mode
desmat: logistic vote memb educ*race [fw=pop], desrep(exp all) to display odds-ratios. See the section on desrep for further details. ContrastsBy default, desmat generates dummy variables using the first category as the reference category, as does xi. However, it can also use different types of restrictions (contrast) and different reference categories when generating the dummy variables. A restriction of some type is required for the effects of categorical variables to be identifiable. The restriction used does not affect the fit of the model but does determine the meaning of the parameters. A common restriction and the one used by xi is to drop the dummy variable for a reference category. The parameters for that variable are then relative to the reference category. Another common constraint is the deviation contrast, in which parameters have a sum of zero. One parameter can therefore be dropped as redundant during estimation and found afterwards using minus the sum of the estimated parameters, or by re-estimating the model using a different omitted category. Bock (1975) and Finn (1974) discuss other types of parameterizations (or contrasts) and the technical details in implementing them. A parameterization can be specified as a name, of which the first three characters are significant, optionally followed by a specification of the reference category in parentheses (no spaces). The reference category should refer to the category number, not the category value. So for a variable with values 0 to 3, the parameterization "dev(1)" indicates that the deviation contrast is to be used with the first category (i.e. 0) as the reference. If no reference category is specified, or the reference category is less than 1 then the first category is used as reference category. If value specified is larger than the number of categories then the highest category is used. Note that for certain types of parameterizations, the "reference" specifiation has a different meaning. The available parameterization types are: |
| ind(ref) | Indicator contrast, i.e. dummy variables with ref as reference (omitted) category. This is the parameterization used by xi and is the default parameterization for desmat. |
| dir | A direct effect, i.e. used to include continuous variables in the model. |
| dev(ref) | Deviation contrast. Parameters sum to zero over the categories of the variable. The parameter for ref is omitted as redundant, but can be found from minus the sum of the estimated parameters. |
| sim(ref) | Simple contrast with ref as reference category. The highest order effects are the same as indicator contrast effects, but lower order effects and the constant will be different. |
| dif(ref) | Difference contrast, for ordered categories. Parameters are relative to the next category. If the first letter of ref is "b" then the backward difference contrast is used instead, and parameters are relative to the previous category. |
| hel(ref) | Helmert contrast, for ordered categories. Estimates represents the contrast between that category and the mean value for the remaining categories. If the first letter of ref is "b" then the reverse helmert contrast is used instead, and parameters are relative to the mean value of the preceding categories. |
| orp(ref) | Orthogonal polynomials of degree ref. The first category is a linear effect, the second quadratic, etc. This option calls orthpoly to generate the design (sub)matrix. |
| use(ref) | A user-defined contrast. Ref refers to a contrast matrix with the same number of colums as the variable has categories, and at least one less rows. If rownames are specified for this matrix, these names will be used as variable labels for the resulting dummy variables. [Single lowercase letters as names for the contrast matrix cause problems at the moment, e.g "use(c)". Use uppercase names or more than one letter, e.g. "use(cc)" or "use(C)"] |
|
desmat: logistic vote memb educ*race [fw=pop], desrep(exp all) defcon(dev(99))
| The deviation contrast will now be used with the highest
category as the redundant category.
The global variable $D_CON can be used to specify a default contrast for the current Stata session. For example: global D_CON "dev(99)" will cause desmat to use the deviation contrast for the duration of the Stata session. By specifing this command in their profile.do, users can specify a different contrast for all desmat models. The $D_CON global variable is overridden by the defcon option if this is specified. Specifying contrasts using the pzat characteristicA pzat characteristic can be assigned to a variable to specify a contrast to be used for that variable. For example, to use the backward difference contrast for education but the default indicator contrast for the other variables, use: |
char educ[pzat] dif(b)
desmat: logistic vote memb educ*race [fw=pop], desrep(exp all)
| The pzat characteristic will override the contrast specified by the defcon option. So in |
char educ[pzat] dif(b)
desmat: logistic vote memb educ*race [fw=pop], desrep(exp all) defcon(dev(99))
The difference contrast will be used for all variables except
educ.
Specifying contrasts in the model specificationIt is also possible to specify contrasts in the model specification, on a variable by variable basis if so desired. This is done by appending "=con[(ref)]" to a single variable, "=con[(ref)].con[(ref)]" to an interaction effect, and "=con[(ref)]*con[(ref)]" to an interaction using hierarchical notation. A somewhat silly example: |
desmat race=ind(1) educ=hel memb vote vote.memb=dif.dev(1), defcon(ind(99))
| The indicator contrast with the highest category as reference
will be used for "memb" and "vote". The variable "race"
will use the indicator contrast as well but with the first category as
reference, other effects will use the contrasts specified. Interpreting
this mishmash of parameterizations would be quite a chore of course.
A variable's pzat characteristic overrides the defcon option, but is itself overridden by a specification in the model. For example: |
char educ[pzat] dif(b)
desmat vote*memb vote*educ*race=dev(99)*orp(1)*dev(99) educ*race*memb, defcon(dev(99))
| Educ will use a first degree polynomial
restriction in the vote*educ*race term and a backward
difference contrast elsewhere. All other variables will use the
deviation contrast.
Specifying contrasts in the model statement will tend to look messy and provides an overkill in flexibility. Use of the pzat characteristic in conjunction with the defcon option and the @ prefix to flag continuous variables will usually be preferable. DesrepDesrep is a program for viewing the results of Stata estimation commands. It can be used after estimating any model but is particularly useful in conjunction with with desmat. Desrep is called by desmat when this is used in command prefix mode. In that case, options for formatting the output can be specified using desmat's desrep option. By default, desrep prints only the coefficients, their standard errors, and symbols indicating significance, thus allowing longer descriptive labels. Used in conjunction with desmat, desrep will print labels on model terms and category values using the [varn] and [valn] characteristics desmat assigns to its dummy variables. If desmat was not used, variable labels are printed instead. Estimates are preceded by a summary of model information, based on results saved in e() by the command. Syntax
By default, desrep prints model information, coefficients, standard errors, and symbols indicating the significance. Additional statistics can be requested and printing of standard errors and signficance symbols can be suppressed. Defaults for some of these options can be modified using global macro variables (see below). If "using filename" is specified then the results are written to a tab- delimited ascii file. The default extension for filename is ".out" (cf. outsheet). If filename already exists, desrep will attmept to find a valid filename by appending a number (this is done using the included outshee2.ado program). The replace option can be used to overwrite an existing file. OptionsA number of options can be used to specify which results are printed and how they are formatted.
The following two options apply only if "using" has
been specified to write the data to a tab-delimited ascii file:
Macro variables to control layoutMacro variables can be use to alter the default for certain desrep options. The macro variables will still be overridden by options specified at the desrep command. The global variables can be specified once at the beginning of the Stata session or in the user's profile.do for all sessions. The following global variables may be defined: $D_FW For example, the following can be used to set the column width for estimates to 8, use 2 decimal places, and symbols and cutpoints for levels of significance: global D_NDEC 2 ShowtrmsShowtrms produces a legend of the dummy variables produced by desmat, the terms these pertain to, and the contrasts used. Syntaxshowtrms The showtrms command has no options. Showtrms is called automatically when desmat is used as a command by itself or when the verbose option is used with desmat as a command prefix. Showtrms can be used at any point after desmat has been used to generate a legend for the last design matrix. For example used above, showtrms would produce the following output: |
. showtrms
Desmat generated the following design matrix:
nr Variables Term Parameterization
First Last
1 _x_1 vote ind(0)
2 _x_2 memb ind(1)
3 _x_3 vote.memb ind(0).ind(1)
4 _x_4 _x_5 educ ind(1)
5 _x_6 _x_7 vote.educ ind(0).ind(1)
6 _x_8 race ind(1)
7 _x_9 vote.race ind(0).ind(1)
8 _x_10 _x_11 educ.race ind(1).ind(1)
9 _x_12 _x_13 vote.educ.race ind(0).ind(1).ind(1)
10 _x_14 _x_15 educ.memb ind(1).ind(1)
11 _x_16 race.memb ind(1).ind(1)
12 _x_17 _x_18 educ.race.memb ind(1).ind(1).ind(1)
DestestDestest is for use after estimating a model with a design matrix generated by desmat to perform a Wald test on model terms. Syntax
The termlist consists of one or more terms as specified in desmat. A term can consist of a single variable, or two or more variables separated by either asterisks or periods. If asterisks are used, they will be changed into periods by destest, i.e. only the highest order interaction will be tested. Nested terms will be tested only if they are explicitly included. (This syntax makes it easier to copy the model syntax and test the highest order terms, which is what most people will be interested in). If destest is specified without any arguments, all terms from the last desmat model will be tested. If "using filename" is specified then the results are written to a tab- delimited ascii file. The default extension for filename is ".out" (cf. outsheet). If filename already exists, destest will attmept to find a valid filename by appending a number (this is done using the included outshee2.ado program). The replace option can be used to overwrite an existing file. Desmat creates global macro variables "$term1", "$term2", etc. containing a varlist for each term in the model. Destest runs through these terms, finds the terms corresponding with the termlist, and runs termpar with the varlist. If these global variables have not been defined, destest will do nothing. These global variables can of course also be used separately in testparm, sw, or related programs. OptionsThe options ndec(), sigcut(), sigsym(), sigsep() have the same usage as in desrep:
The following two options apply only if "using" has
been specified to write the data to a tab-delimited ascii file:
Global macro variables can be used to specify different defaults for these options, either for the session or for all Stata sessions, by placing the global variables in the users profile.do. $D_NDEC Options specified in the destest command string will override these global variables. NoteThe Stata version of desmat was derived from a SAS macro by the same name that I wrote during the course of my PhD dissertation (Hendrickx 1994). The SAS version is available here. References
|