Title: | Correlation and Regression Analyses for Randomized Response Data |
---|---|
Description: | Univariate and multivariate methods to analyze randomized response (RR) survey designs (e.g., Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69, <doi:10.2307/2283137>). Besides univariate estimates of true proportions, RR variables can be used for correlations, as dependent variable in a logistic regression (with or without random effects), or as predictors in a linear regression (Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression analyses of randomized response data. Journal of Statistical Software, 85(2), 1–29, <doi:10.18637/jss.v085.i02>). For simulations and the estimation of statistical power, RR data can be generated according to several models. The implemented methods also allow to test the link between continuous covariates and dishonesty in cheating paradigms such as the coin-toss or dice-roll task (Moshagen, M., & Hilbig, B. E. (2017). The statistical analysis of cheating paradigms. Behavior Research Methods, 49, 724–732, <doi:10.3758/s13428-016-0729-x>). |
Authors: | Daniel W. Heck [cre, aut] , Morten Moshagen [ctb] |
Maintainer: | Daniel W. Heck <[email protected]> |
License: | GPL-3 |
Version: | 0.7.5 |
Built: | 2025-01-15 05:10:46 UTC |
Source: | https://github.com/danheck/rrreg |
Univariate and multivariate methods for randomized response (RR) survey
designs (e.g., Warner, 1965). Univariate estimates of true proportions can be
obtained using RRuni
. RR variables can be used in multivariate
analyses for correlations (RRcor
), as dependent variable in a
logistic regression (RRlog
), or as predictors in a linear
regression (RRlin
). The function RRgen
generates
single RR data sets, whereas RRsimu
generates and analyzes RR
data repeatedly for simulation and bootstrap purposes. An overview of the
available RR designs and examples can be found in the package vignette by
vignette('RRreg')
.
In case of issues or questions, please refer to the GitHub repository: https://github.com/danheck/RRreg
An introduction with examples is available via vignette("RRreg")
or
at the website: https://www.dwheck.de/vignettes/RRreg.html
If you use RRreg
in publications, please cite the package as follows:
Heck, D. W., & Moshagen, M. (2018). RRreg: An R package for correlation and regression analyses of randomized response data. Journal of Statistical Software. 85 (2), 1-29. doi:10.18637/jss.v085.i02
Daniel W. Heck [email protected]
Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69.
Useful links:
Compute an analysis of deviance table for two logistic RR regression models.
## S3 method for class 'RRlog' anova(object, ...)
## S3 method for class 'RRlog' anova(object, ...)
object |
object of class |
... |
a second |
Daniel W. Heck
# generate data n <- 500 x <- data.frame(x1 = rnorm(n)) pi.true <- 1 / (1 + exp(.3 + 1.5 * x$x1)) true <- rbinom(n, 1, plogis(pi.true)) dat <- RRgen(n, trueState = true, model = "Warner", p = .1) x$response <- dat$response # fit and plot RR logistic regression mod1 <- RRlog(response ~ x1, data = x, model = "Warner", p = .1) mod0 <- RRlog(response ~ 1, data = x, model = "Warner", p = .1) anova(mod1, mod0)
# generate data n <- 500 x <- data.frame(x1 = rnorm(n)) pi.true <- 1 / (1 + exp(.3 + 1.5 * x$x1)) true <- rbinom(n, 1, plogis(pi.true)) dat <- RRgen(n, trueState = true, model = "Warner", p = .1) x$response <- dat$response # fit and plot RR logistic regression mod1 <- RRlog(response ~ x1, data = x, model = "Warner", p = .1) mod0 <- RRlog(response ~ 1, data = x, model = "Warner", p = .1) anova(mod1, mod0)
Given some randomization probabilities p
, each RR design corresponds
to a misclassification matrix PW. This square matrix has entries defined as:
PW[i,j] = P(respond i | true state j)
.
getPW(model, p, group = 1, par2 = NULL, Kukrep = 1)
getPW(model, p, group = 1, par2 = NULL, Kukrep = 1)
model |
one of the available models in the package |
p |
randomization probability |
group |
group index (1 or 2) for two-group designs such as
|
par2 |
the second, estimated parameter in two-group designs (e.g., the
unknown prevalence of the irrelevant question in |
Kukrep |
number of replications in Kuk's RR design (how many cards are drawn) |
The method is used internally for estimation. Moreover, the method might be useful to check the exact definition of the RR designs.
Note that for two-group designs, the matrix depends on a second parameter that is estimated from the data (e.g., the unknown prevalence of the unknown question in the unrelated question technique). Hence, the matrix itself is not constant, but an estimate of a random variable itself.
van den Hout, A., & Kooiman, P. (2006). Estimating the Linear Regression Model with Categorical Covariates Subject to Randomized Response. Computational Statistics & Data Analysis, 50(11), 3311–3323.
getPW(model = "Warner", p = 2 / 12) getPW(model = "UQTknown", p = c(2 / 12, .3)) getPW(model = "UQTunknown", p = c(2 / 12, .10 / 12), group = 2, par2 = .4)
getPW(model = "Warner", p = 2 / 12) getPW(model = "UQTknown", p = c(2 / 12, .3)) getPW(model = "UQTunknown", p = c(2 / 12, .10 / 12), group = 2, par2 = .4)
Data by Radukic and Musch (2014)
data(minarets)
data(minarets)
An object of class data.frame
with 1621 rows and 7 columns.
The following variables are included:
age
in years
leftRight
political left-right orientation on a scale from -5 to 5
rrt
response to RR question (SLD with randomization probabilities
p=c(2/12,10/12)
)
condition
group membership in SLD (either randomization probability
p[1]
or p[2]
)
RRdesign
whether the respondent answered to the RR question
(RRdesign=1) or to the direct question (RRdesign=-1)
leftRight.c
zero-centered political left-right orientation
age.c
zero-centered age
Plot estimated power from Monte Carlo simulation as a function of the sample size, separately for different effect sizes and multivariate RR methods
## S3 method for class 'powerplot' plot(x, ...)
## S3 method for class 'powerplot' plot(x, ...)
x |
a |
... |
ignored |
Plot predicted logit values/probabilities of a randomized response logistic regression model.
## S3 method for class 'RRlog' plot( x, predictor = NULL, type = c("link", "response", "attribute"), center.preds = TRUE, plot.mean = TRUE, ci = 0.95, xlim = NULL, steps = 50, ... )
## S3 method for class 'RRlog' plot( x, predictor = NULL, type = c("link", "response", "attribute"), center.preds = TRUE, plot.mean = TRUE, ci = 0.95, xlim = NULL, steps = 50, ... )
x |
a fitted RRlog object |
predictor |
character name of a predictor of the model to be fitted |
type |
|
center.preds |
whether to compute predictions by assuming that all other
predictors are at their respective mean values (if |
plot.mean |
whether to plot the mean of the predictor as a vertical line |
ci |
level for confidence intervals. Use |
xlim |
if provided, these boundaries are used for the predictor on the x-axis |
steps |
number of steps for plotting |
... |
other arguments passed to the function plot (e.g., |
# generate data n <- 500 x <- data.frame(x1 = rnorm(n)) pi.true <- 1 / (1 + exp(.3 + 1.5 * x$x1)) true <- rbinom(n, 1, plogis(pi.true)) dat <- RRgen(n, trueState = true, model = "Warner", p = .1) x$response <- dat$response # fit and plot RR logistic regression mod <- RRlog(response ~ x1, data = x, model = "Warner", p = .1) plot(mod, "x1", ci = .95, type = "attribute", ylim = 0:1)
# generate data n <- 500 x <- data.frame(x1 = rnorm(n)) pi.true <- 1 / (1 + exp(.3 + 1.5 * x$x1)) true <- rbinom(n, 1, plogis(pi.true)) dat <- RRgen(n, trueState = true, model = "Warner", p = .1) x$response <- dat$response # fit and plot RR logistic regression mod <- RRlog(response ~ x1, data = x, model = "Warner", p = .1) plot(mod, "x1", ci = .95, type = "attribute", ylim = 0:1)
Uses the function RRsimu
to estimate the power of the
multivariate RR methods (correlation RRcor
, logistic regression
RRlog
, and/or linear regression RRlin
.
powerplot( numRep, n = c(100, 500, 1000), pi, cor = c(0, 0.1, 0.3), b.log = NULL, model, p, method = c("RRcor", "RRlog", "RRlin"), complyRates = c(1, 1), sysBias = c(0, 0), groupRatio = 0.5, alpha = 0.05, nCPU = 1, show.messages = TRUE )
powerplot( numRep, n = c(100, 500, 1000), pi, cor = c(0, 0.1, 0.3), b.log = NULL, model, p, method = c("RRcor", "RRlog", "RRlin"), complyRates = c(1, 1), sysBias = c(0, 0), groupRatio = 0.5, alpha = 0.05, nCPU = 1, show.messages = TRUE )
numRep |
number of boostrap replications |
n |
vector of samples sizes |
pi |
true prevalence |
cor |
vector of true correlations |
b.log |
vector of true logistic regression coefficients |
model |
randomized response model |
p |
randomization probability |
method |
multivariate RR method |
complyRates |
probability of compliance within carriers/noncarriers of sensitive attribute |
sysBias |
probability of responding 'yes' in case of noncompliance |
groupRatio |
ratio of subgroups in two-group RR designs |
alpha |
type-I error used to estimate power |
nCPU |
either the number of CPU cores or a cluster initialized via
|
show.messages |
toggle printing of progress messages |
a list of the class powerplot
containing an array res
with the power estimates and details of the simulation (e.g., model, p, pi,
etc.)
RRsimu
for Monte-Carlo simulation / parametric bootstrap
# Not run # pplot <- powerplot(100, n=c(150,250), cor=c(0,.3,.5), # method="RRlog", pi=.6, model="Warner", p=.3) # plot(pplot)
# Not run # pplot <- powerplot(100, n=c(150,250), cor=c(0,.3,.5), # method="RRlog", pi=.6, model="Warner", p=.3) # plot(pplot)
Predictions of the RR logistic regression model for the individual probabilities (or logits) of having the sensitive RR attribute, or of the probability of the RR responses.
## S3 method for class 'RRlog' predict( object, newdata = NULL, type = c("link", "response", "attribute"), se.fit = FALSE, ci = 0.95, ... )
## S3 method for class 'RRlog' predict( object, newdata = NULL, type = c("link", "response", "attribute"), se.fit = FALSE, ci = 0.95, ... )
object |
A fitted |
newdata |
An optional vector, matrix, or data.frame with values on the predictor variables. Note that for matrices, the order of predictors should match the order of predictors in the formula. Uses the fitted values of the model if omitted. |
type |
|
se.fit |
Return standard errors for the predicted values in addition to confidence intervals. SEs on the logit scale are computed using the observed Fisher information from the fitted model. Standard errors for the probability scale are computed using the delta method. |
ci |
Confidence level for confidence interval. If |
... |
ignored |
either a vector of predicted values or a matrix with columns for the
point estimates, confidence interval, and standard errors (if se.fit=TRUE
and ci=.95
).
RRcor
calculates bivariate Pearson correlations of variables measured
with or without RR.
RRcor( x, y = NULL, models, p.list, group = NULL, bs.n = 0, bs.type = c("se.n", "se.p", "pval"), nCPU = 1 )
RRcor( x, y = NULL, models, p.list, group = NULL, bs.n = 0, bs.type = c("se.n", "se.p", "pval"), nCPU = 1 )
x |
a numeric vector, matrix or data frame. |
y |
|
models |
a vector defining which RR design is used for each variable.
Must be in the same order as variables appear in |
p.list |
a |
group |
a matrix defining the group membership of each participant
(values 1 and 2) for all multiple group models( |
bs.n |
number of samples used to get bootstrapped standard errors |
bs.type |
to get boostrapped standard errors, use |
nCPU |
only relevant for the bootstrap: either the number of CPU cores
or a cluster initialized via |
Correlations of RR variables are calculated by the method of Fox & Tracy (1984) by interpreting the variance induced by the RR procedure as uncorrelated measurement error. Since the error is independent, the correlation can be corrected to obtain an unbiased estimator.
Note that the continuous RR model mix.norm
with the randomization
parameter p=c(p.truth, mean, SD)
assumes that participants respond
either to the sensitive question with probability p.truth
or otherwise
to a known masking distribution with known mean and SD. The estimated
correlation only depends on the mean and SD and does not require normality.
However, the assumption of normality is used in the parametric bootstrap to
obtain standard errors.
RRcor
returns a list with the following components::
r
estimated correlation matrix
rSE.p
, rSE.n
standard errors from parametric/nonparametric
bootstrap
prob
two-sided p-values from parametric bootstrap
samples.p
, samples.n
sampled correlations from
parametric/nonparametric bootstrap (for the standard errors)
Fox, J. A., & Tracy, P. E. (1984). Measuring associations with randomized response. Social Science Research, 13, 188-197.
vignette('RRreg')
or
https://www.dwheck.de/vignettes/RRreg.html for a
detailed description of the RR models and the appropriate definition of
p
# generate first RR variable n <- 1000 p1 <- c(.3, .7) gData <- RRgen(n, pi = .3, model = "Kuk", p1) # generate second RR variable p2 <- c(.8, .5) t2 <- rbinom(n = n, size = 1, prob = (gData$true + 1) / 2) temp <- RRgen(model = "UQTknown", p = p2, trueState = t2) gData$UQTresp <- temp$response gData$UQTtrue <- temp$true # generate continuous covariate gData$cov <- rnorm(n, 0, 4) + gData$UQTtrue + gData$true # estimate correlations using directly measured / RR variables cor(gData[, c("true", "cov", "UQTtrue")]) RRcor( x = gData[, c("response", "cov", "UQTresp")], models = c("Kuk", "d", "UQTknown"), p.list = list(p1, p2) )
# generate first RR variable n <- 1000 p1 <- c(.3, .7) gData <- RRgen(n, pi = .3, model = "Kuk", p1) # generate second RR variable p2 <- c(.8, .5) t2 <- rbinom(n = n, size = 1, prob = (gData$true + 1) / 2) temp <- RRgen(model = "UQTknown", p = p2, trueState = t2) gData$UQTresp <- temp$response gData$UQTtrue <- temp$true # generate continuous covariate gData$cov <- rnorm(n, 0, 4) + gData$UQTtrue + gData$true # estimate correlations using directly measured / RR variables cor(gData[, c("true", "cov", "UQTtrue")]) RRcor( x = gData[, c("response", "cov", "UQTresp")], models = c("Kuk", "d", "UQTknown"), p.list = list(p1, p2) )
The method RRgen
generates data according to a specified
RR model, e.g., "Warner"
. True states are either provided by a
vector trueState
or drawn randomly from a Bernoulli distribution.
Useful for simulation and testing purposes, e.g., power analysis.
RRgen( n, pi.true, model, p, complyRates = c(1, 1), sysBias = c(0, 0), groupRatio = 0.5, Kukrep = 1, trueState = NULL )
RRgen( n, pi.true, model, p, complyRates = c(1, 1), sysBias = c(0, 0), groupRatio = 0.5, Kukrep = 1, trueState = NULL )
n |
sample size of generated data |
pi.true |
true proportion in population (a vector for m-categorical
|
model |
specifes the RR model, one of: |
p |
randomization probability (depending on model, see
|
complyRates |
vector with two values giving the proportions of carriers and non-carriers who adhere to the instructions, respectively |
sysBias |
probability of responding 'yes' (coded as 1) in case of
non-compliance for carriers and non-carriers of the sensitive attribute,
respectively. If |
groupRatio |
proportion of participants in group 1. Only required for
two-group models, e.g., |
Kukrep |
Number of repetitions of Kuk's procedure (how often red and black cards are drawn) |
trueState |
optional vector containing true states of participants
(i.e., 1 for carriers and 0 for noncarriers of sensitive attribute; for
|
If trueState
is specified, the randomized response procedure
will be simulated for this vector, otherwise a random vector of length
n
with true proportion pi.true
is drawn. Respondents answer
biases can be simulated by adjusting the compliance rates: if
complyRates
is set to c(1,1)
, all respondents adhere to the
randomization procedure. If one or both rates are smaller than 1,
sysBias
determines whether noncompliant respondents systematically
choose the nonsensitive category or whether they answer randomly.
SLD
- to generate data according to the stochastic lie detector with
the proportion t
of honest carriers, parameters are set to
complyRates=c(t,1)
and sysBias=c(0,0)
CDM
- to generate data according to the cheating detection model with
the proportion gamma
of cheaters, parameters are set to
complyRates=c(1-gamma,1-gamma)
and sysBias=c(0,0)
data.frame
including the variables true
and
response
(and for SLD
and CDM
a third variable
group
)
see vignette('RRreg')
for a detailed description of the
models and RRlog
, RRlin
and RRcor
for the multivariate analysis of RR data
# Generate responses of 1000 people according to Warner's model, # every participant complies to the RR procedure genData <- RRgen(n = 1000, pi.true = .3, model = "Warner", p = .7) colMeans(genData) # use Kuk's model with two decks of cards, # p gives the proportions of red cards for carriers/noncarriers genData <- RRgen(n = 1000, pi.true = .4, model = "Kuk", p = c(.4, .7)) colMeans(genData) # Stochastic Lie Detector (SLD): # Only 80% of carriers answer according to the RR procedure genData <- RRgen( n = 1000, pi.true = .2, model = "SLD", p = c(.2, .8), complyRates = c(.8, 1), sysBias = c(0, 0) ) colMeans(genData)
# Generate responses of 1000 people according to Warner's model, # every participant complies to the RR procedure genData <- RRgen(n = 1000, pi.true = .3, model = "Warner", p = .7) colMeans(genData) # use Kuk's model with two decks of cards, # p gives the proportions of red cards for carriers/noncarriers genData <- RRgen(n = 1000, pi.true = .4, model = "Kuk", p = c(.4, .7)) colMeans(genData) # Stochastic Lie Detector (SLD): # Only 80% of carriers answer according to the RR procedure genData <- RRgen( n = 1000, pi.true = .2, model = "SLD", p = c(.2, .8), complyRates = c(.8, 1), sysBias = c(0, 0) ) colMeans(genData)
Linear regression for a continuous criterion, using randomized-response (RR) variables as predictors.
RRlin( formula, data, models, p.list, group = NULL, Kukrep = 1, bs.n = 0, nCPU = 1, maxit = 1000, fit.n = 3, pibeta = 0.05 )
RRlin( formula, data, models, p.list, group = NULL, Kukrep = 1, bs.n = 0, nCPU = 1, maxit = 1000, fit.n = 3, pibeta = 0.05 )
formula |
a continuous criterion is predicted by one or more categorical
RR variables defined by |
data |
an optional data frame, list or environment, containing the variables in the model. |
models |
character vector specifying RR model(s) in order of appearance
in formula. Available models: |
p.list |
list of randomization probabilities for RR models in the same
order as specified in |
group |
vector or matrix specifying group membership by the indices 1
and 2. Only for multigroup RR models, e.g., |
Kukrep |
defines the number of repetitions in Kuk's card playing method |
bs.n |
Number of samples used for the non-parametric bootstrap |
nCPU |
only relevant for the bootstrap: either the number of CPU cores
or a cluster initialized via |
maxit |
maximum number of iterations in optimization routine |
fit.n |
number of fitting runs with random starting values |
pibeta |
approximate ratio of probabilities pi to regression weights beta (to adjust scaling). Can be used for speeding-up and fine-tuning ML estimation (i.e., choosing a smaller value for larger beta values). |
Returns an object RRlin
which can be analysed by the generic
method summary
Daniel W. Heck
van den Hout, A., & Kooiman, P. (2006). Estimating the linear regression model with categorical covariates subject to randomized response. Computational Statistics & Data Analysis, 50, 3311-3323.
vignette('RRreg')
or
https://www.dwheck.de/vignettes/RRreg.html for a
detailed description of the RR models and the appropriate definition of
p
# generate two RR predictors dat <- RRgen(n = 500, pi = .4, model = "Warner", p = .3) dat2 <- RRgen(n = 500, pi = c(.4, .6), model = "FR", p = c(.1, .15)) dat$FR <- dat2$response dat$trueFR <- dat2$true # generate a third predictor and continuous dependent variables dat$nonRR <- rnorm(500, 5, 1) dat$depvar <- 2 * dat$true - 3 * dat2$true + .5 * dat$nonRR + rnorm(500, 1, 7) # use RRlin and compare to regression on non-RR variables linreg <- RRlin(depvar ~ response + FR + nonRR, data = dat, models = c("Warner", "FR"), p.list = list(.3, c(.1, .15)), fit.n = 1 ) summary(linreg) summary(lm(depvar ~ true + trueFR + nonRR, data = dat))
# generate two RR predictors dat <- RRgen(n = 500, pi = .4, model = "Warner", p = .3) dat2 <- RRgen(n = 500, pi = c(.4, .6), model = "FR", p = c(.1, .15)) dat$FR <- dat2$response dat$trueFR <- dat2$true # generate a third predictor and continuous dependent variables dat$nonRR <- rnorm(500, 5, 1) dat$depvar <- 2 * dat$true - 3 * dat2$true + .5 * dat$nonRR + rnorm(500, 1, 7) # use RRlin and compare to regression on non-RR variables linreg <- RRlin(depvar ~ response + FR + nonRR, data = dat, models = c("Warner", "FR"), p.list = list(.3, c(.1, .15)), fit.n = 1 ) summary(linreg) summary(lm(depvar ~ true + trueFR + nonRR, data = dat))
A dichotomous variable, measured once or more per person by a randomized response method, serves as dependent variable using one or more continuous and/or categorical predictors.
RRlog( formula, data, model, p, group, n.response = 1, LR.test = TRUE, fit.n = 3, EM.max = 1000, optim.max = 500, ... )
RRlog( formula, data, model, p, group, n.response = 1, LR.test = TRUE, fit.n = 3, EM.max = 1000, optim.max = 500, ... )
formula |
specifying the regression model, see |
data |
|
model |
Available RR models: |
p |
randomization probability/probabilities (depending on model, see
|
group |
vector specifying group membership. Can be omitted for
single-group RR designs (e.g., Warner). For two-group RR designs (e.g.,
|
n.response |
number of responses per participant, e.g., if a participant
responds to 5 RR questions with the same randomization probability |
LR.test |
test regression coefficients by a likelihood ratio test, i.e.,
fitting the model repeatedly while excluding one parameter at a time (each
nested model is fitted only once, which can result in local maxima). The
likelihood-ratio test statistic |
fit.n |
Number of fitting replications using random starting values to avoid local maxima |
EM.max |
maximum number of iterations of the EM algorithm. If
|
optim.max |
Maximum number of iterations within each run of |
... |
ignored |
The logistic regression model is fitted first by an EM algorithm, in
which the dependend RR variable is treated as a misclassified binary
variable (Magder & Hughes, 1997). The results are used as starting values
for a Newton-Raphson based optimization by optim
.
Returns an object RRlog
which can be analysed by the generic
method summary
. In the table of coefficients, the column
Wald
refers to the Chi^2 test statistic which is computed as Chi^2 =
z^2 = Estimate^2/StdErr^2. If LR.test = TRUE
, the test statistic
deltaG2
is the likelihood-ratio-test statistic, which is computed by
fitting a nested logistic model without the corresponding predictor.
Daniel W. Heck
van den Hout, A., van der Heijden, P. G., & Gilchrist, R. (2007). The logistic regression model with response variables subject to randomized response. Computational Statistics & Data Analysis, 51, 6060-6069.
anova.RRlog
for model comparisons, plot.RRlog
for plotting predicted regression curves, and vignette('RRreg')
or
https://www.dwheck.de/vignettes/RRreg.html for a
detailed description of the RR models and the appropriate definition of
p
# generate data set without biases dat <- RRgen(1000, pi = .3, "Warner", p = .9) dat$covariate <- rnorm(1000) dat$covariate[dat$true == 1] <- rnorm(sum(dat$true == 1), .4, 1) # analyse ana <- RRlog(response ~ covariate, dat, "Warner", p = .9, fit.n = 1) summary(ana) # check with true, latent states: glm(true ~ covariate, dat, family = binomial(link = "logit"))
# generate data set without biases dat <- RRgen(1000, pi = .3, "Warner", p = .9) dat$covariate <- rnorm(1000) dat$covariate[dat$true == 1] <- rnorm(sum(dat$true == 1), .4, 1) # analyse ana <- RRlog(response ~ covariate, dat, "Warner", p = .9, fit.n = 1) summary(ana) # check with true, latent states: glm(true ~ covariate, dat, family = binomial(link = "logit"))
Uses the package lme4
to fit a generalized linear mixed model
(GLMM) with an adjusted link funciton.
RRmixed(formula, data, model, p, const = 1e-04, adjust_control = FALSE, ...)
RRmixed(formula, data, model, p, const = 1e-04, adjust_control = FALSE, ...)
formula |
two-sided formula including random and fixed effects (see
below or |
data |
an optional data frame with variables named in formula |
model |
type of RR design. Only 1-group RR designs are supported at the
moment (i.e., |
p |
randomization probability |
const |
the RR link function is not defined for small and/or large
probabilities (the boundaries depend on |
adjust_control |
whether to adjust the control arguments for
|
... |
further arguments passed to |
Some examples for formula:
random intercept:
response ~ covariate + (1 | group)
random slope:
response ~ covariate + (0 + covariate | group)
both random slope and intercept:
response ~ covariate +(covariate | group)
level-2 predictor (must have constant values within groups!):
response ~ lev2 + (1|group)
Note that parameter estimation will be unstable and might fail if the
observed responses are not in line with the model. For instance, a
Forced-Response model (model="FR"
) with p=c(0,1/4)
requires
that expected probabilities for responses are in the interval [.25,1.00]. If
the observed proportion of responses is very low (<<.25), intercepts will be
estimated to be very small (<<0) and/or parameter estimation might fail. See
glmer
for setting better starting values and
lmerControl
for further options to increase stability.
an object of class glmerMod
van den Hout, A., van der Heijden, P. G., & Gilchrist, R. (2007). The Logistic Regression Model with Response Variables Subject to Randomized Response. Computational Statistics & Data Analysis, 51, 6060–6069.
# generate data with a level-1 predictor set.seed(1234) d <- data.frame( group = factor(rep(LETTERS[1:20], each = 50)), cov = rnorm(20 * 50) ) # generate dependent data based on logistic model (random intercept): d$true <- simulate(~ cov + (1 | group), newdata = d, family = binomial(link = "logit"), newparams = list( beta = c("(Intercept)" = -.5, cov = 1), theta = c("group.(Intercept)" = .8) ) )[[1]] # scramble responses using RR: model <- "FR" p <- c(true0 = .1, true1 = .2) d$resp <- RRgen(model = "FR", p = p, trueState = d$true)$response # fit model: mod <- RRmixed(resp ~ cov + (1 | group), data = d, model = "FR", p = p) summary(mod)
# generate data with a level-1 predictor set.seed(1234) d <- data.frame( group = factor(rep(LETTERS[1:20], each = 50)), cov = rnorm(20 * 50) ) # generate dependent data based on logistic model (random intercept): d$true <- simulate(~ cov + (1 | group), newdata = d, family = binomial(link = "logit"), newparams = list( beta = c("(Intercept)" = -.5, cov = 1), theta = c("group.(Intercept)" = .8) ) )[[1]] # scramble responses using RR: model <- "FR" p <- c(true0 = .1, true1 = .2) d$resp <- RRgen(model = "FR", p = p, trueState = d$true)$response # fit model: mod <- RRmixed(resp ~ cov + (1 | group), data = d, model = "FR", p = p) summary(mod)
Simulate and analyse bivariate data including either one RR variable (either correlation, logistic, or linear regression model) or two RR variables (only correlations). Useful for power analysis, parametric bootstraps or for testing the effects of noncompliance on the stability of estimates.
RRsimu( numRep, n, pi, model, p, cor = 0, b.log = 0, complyRates = c(1, 1), sysBias = c(0, 0), method = c("RRuni", "RRcor", "RRlog", "RRlin"), alpha = 0.05, groupRatio = 0.5, MLest = FALSE, getPower = TRUE, nCPU = 1 )
RRsimu( numRep, n, pi, model, p, cor = 0, b.log = 0, complyRates = c(1, 1), sysBias = c(0, 0), method = c("RRuni", "RRcor", "RRlog", "RRlin"), alpha = 0.05, groupRatio = 0.5, MLest = FALSE, getPower = TRUE, nCPU = 1 )
numRep |
number of replications |
n |
sample size |
pi |
true proportion of carriers of sensitive attribute (for 2 RR
variables: |
model |
either one or two RR model (as |
p |
randomization probability (for 2 RR variables: a |
cor |
true Pearson-correlation used for data generation (for
|
b.log |
true regression coefficient in logistic regression (for
|
complyRates |
vector with two values giving the proportions of
participants who adhere to the instructions in the subset with or without
the sensitive attribute, respectively (for 2 RR variables: a |
sysBias |
probability of responding 'yes' (coded as 1 in the RR
variable) in case of non-compliance for carriers and non-carriers,
respectively. See |
method |
vector specifying which RR methods to be used in each
replication. For a single RR variable, the methods |
alpha |
significance threshold for testing the logistic regression
parameter |
groupRatio |
proportion of participants in group 1. Only for two-group
models (e.g., |
MLest |
concerns |
getPower |
whether to compute power for |
nCPU |
either the number of CPU cores or a cluster initialized via
|
For a single RR variable:
The parameter b.log
is the slope-coefficient for the true, latent
values in a logistic regression model that is used for data generation.
The argument cor
is used for data generation for linear models. The
directly measured covariate is sampled from a normal distribution with
shifted means, depending on the true state on the sensitive attribute
(i.e., the true, underlying values on the RR variable). For dichotomous RR
variables, this corresponds to the assumption of an ordinary t-test, where
the dependent variable is normally distributed within groups with equal
variance. The difference in means is chosen in a way, to obtain the
point-biserial correlation defined by cor
.
For two RR variables:
cor
has to be used. In case of two dichotomous RR variables, the
true group membership of individuals is sampled from a 2x2 cross table.
Within this table, probabilities are chosen in a way, to obtain the
point-tetrachoric correlation defined by cor
Note, that for the FR model with multiple response categories (e.g., from 0
to 4), the specified cor
is not the exact target of the sampling
procedure. It assumes a normal distribution for each true state, with
constant differences between the groups (i.e., it assumes an interval
scaled variable).
A list containing
parEsts |
matrix containing the estimated parameters |
results |
matrix with mean parameters, standard errors, and number of samples to which the respective method could not be fitted |
power |
vector with the estimated power of the selected randomized response procedures |
# Not run: Simulate data according to the Warner model # mcsim <- RRsimu(numRep=100, n=300, pi=.4, # model="Warner", p=.2, cor=.3) # print(mcsim)
# Not run: Simulate data according to the Warner model # mcsim <- RRsimu(numRep=100, n=300, pi=.4, # model="Warner", p=.2, cor=.3) # print(mcsim)
Analyse a data vector response
with a specified RR model (e.g.,
Warner
) with known randomization probability p
RRuni(response, data, model, p, group = NULL, MLest = TRUE, Kukrep = 1)
RRuni(response, data, model, p, group = NULL, MLest = TRUE, Kukrep = 1)
response |
either vector of responses containing 0='no' and 1='yes' or
name of response variable in |
data |
optional |
model |
defines RR model. Available models: |
p |
randomization probability (see details or |
group |
a group vector of the same length as |
MLest |
whether to use |
Kukrep |
number of repetitions of Kuk's card-drawing method |
Each RR design model
differs in the definition of the
randomization probability p
, which is defined as a single probability
for
"Warner"
: Probability to get sensitive Question
"Mangat"
: Prob. for non-carriers to respond truthfully (i.e., with No=0)
"Crosswise"
: Probability to respond 'yes' to irrelevant second
question (coding of responses: 1=['no-no' or 'yes-yes']; 0=['yes-no' or 'no-yes'])
"Triangular"
: Probability to respond 'yes' to irrelevant second
question (coding of responses: 0='no' to both questions (='circle'); 1='yes'
to at least one question ('triangle'))
and as a two-valued vector of probabilities for
"Kuk"
: Probability of red cards in first and second set,
respectively (red=1, black=0);
Unrelated Question ("UQTknown"
): Prob. to respond to sensitive
question and known prevalence of 'yes' responses to unrelated question
Unrelated Question ("UQTunknown"
): Prob. to respond to
sensitive question in group 1 and 2, respectively
Cheating Detection ("CDM"
): Prob. to be prompted to say yes
in group 1 and 2, respectively
Symmetric CDM ("CDMsym"
): 4-valued vector: Prob. to be
prompted to say 'yes'/'no' in group 1 and 'yes'/'no' in group 2
Stochastic Lie Detector ("SLD"
): Prob. for noncarriers to
reply with 0='no' in group 1 and 2, respectively
Forced Response model ("FR"
): m-valued vector (m=number of
response categories) with the probabilities of being prompted to select
response categories 0,1,..,m-1, respectively (requires sum(p)<1
)
RR as misclassification ("custom"
): a quadratic misclassification
matrix is specified, where the entry p[i,j]
defines the
probability of responding i (i-th row) given a true state of j
(j-th column)) (see getPW
)
For the continuous RR models:
"mix.norm"
: 3-valued vector - Prob. to respond to sensitive
question and mean and SD of the masking normal distribution of the
unrelated question
"mix.exp"
: 2-valued vector - Prob. to respond to sensitive
question and mean of the masking exponential distribution of the
unrelated question
"mix.unknown"
: 2-valued vector - Prob. of responding to
sensitive question in group 1 and 2, respectively
an RRuni
object, can by analyzed by using summary
vignette('RRreg')
or
https://www.dwheck.de/vignettes/RRreg.html for a
detailed description of the RR models and the appropriate definition of p
# Generate responses of 1000 people according to Warner's model # with an underlying true proportion of .3 df <- RRgen(n = 1000, pi = .3, model = "Warner", p = .7) head(df) # Analyse univariate data to estimate prevalence 'pi' estimate <- RRuni(response = df$response, model = "Warner", p = .7) summary(estimate) # Generate data in line with the Stochastic Lie Detector # assuming that 90% of the respondents answer truthfully df2 <- RRgen( n = 1000, pi = .3, model = "SLD", p = c(.2, .8), complyRates = c(.8, 1), groupRatio = 0.4 ) estimate2 <- RRuni( response = df2$response, model = "SLD", p = c(.2, .8), group = df2$group ) summary(estimate2)
# Generate responses of 1000 people according to Warner's model # with an underlying true proportion of .3 df <- RRgen(n = 1000, pi = .3, model = "Warner", p = .7) head(df) # Analyse univariate data to estimate prevalence 'pi' estimate <- RRuni(response = df$response, model = "Warner", p = .7) summary(estimate) # Generate data in line with the Stochastic Lie Detector # assuming that 90% of the respondents answer truthfully df2 <- RRgen( n = 1000, pi = .3, model = "SLD", p = c(.2, .8), complyRates = c(.8, 1), groupRatio = 0.4 ) estimate2 <- RRuni( response = df2$response, model = "SLD", p = c(.2, .8), group = df2$group ) summary(estimate2)