PyRake is a Python library for calculating balancing weights to adjust surveys for non-response bias. Balancing weights try to accomplish three things:
- Reduce bias (by using an estimator emphasizing under-represented respondents)
- Improve representativeness (by balancing important covariates)
- Reduce variance (by controlling the sizes of the weights)
If we knew the probability each person would respond to the survey, a
natural choice of weight is
(NB: the Horvitz-Thompson estimator is often written
Typically the propensity scores are unknown and must be estimated, but in that scenario we cannot expect the corresponding estimator to be unbiased. The best we can hope is for the bias to be small. Given that an unbiased estimator is unattainable, we might as well see what else weights can give us.
We might ask the weights exactly balance important covariates. If
The design effect is a common way of quantifying the efficiency of
an unrepresentative sample. It's the ratio of the adjusted and
unadjusted variances, and for simple weighted estimators is
PyRake can be used for solving problems of the form:
where D is a distance metric that keeps w close to
The constraint on the
PyRake can also be used to solve a sequence of these problems, with
varying
While it may seem desirable to stay as close as possible to the baseline weights (while enforcing balance on important covariates), graphs like the one above show that it is often possible to dramatically reduce variance, by deviating only slightly farther from the baseline weights.
Popular methods for reweighting include Raking, Entropy Balancing, and Stable Balancing Weights. These can be seen as special cases or modifications of the problem family PyRake solves.
Raking (Deming and Stephan, 1940) solves:
where D(w, v) is the KL divergence; v is typically chosen to be
uniform (that is, all entries of v are ones). As the name implies,
PyRake can be used to calculate Raking weights, if we set
Like Raking, Entropy Balancing (Hainmuller, 2012) uses KL divergence
as the distance metric and omits the variance constraint. An
additional constraint is applied: constrain_mean_weight_to=np.mean(v) to the Rake constructor.
(We tend to think in terms of mean weights rather than sums of
weights; a mean weight of 1 corresponds to a true weighted average.
Note that if we knew the true propensity scores, unbiased weights
would not necessarily have mean 1, and would not correspond to a
true weighted average.)
Stable Balancing Weights (Zubizarreta, 2015) use
Once suitable weights have been chosen, we can use them to estimate
the mean of some quantity in a population based on a sample. As noted
above, a natural estimator is
Many analysts therefore normalize the weights to enforce a true
weighted average. The stabilized, or SIPW estimator is:
Sometimes we may have some kind of model,
where the first sum is over everyone in the population, and the second
limited to the sample. This estimator uses the outcome model to
estimate the outcome for everyone in the population, averages those
outcomes to estimate the mean, then adjusts this estimate with a
weighted sum of residuals from the sample. The latter term can use a
weighted average (SIPW) rather than a weighted sum (IPW).
These augmented methods have a doubly-robust property: if the
propensity scores are correct, or the outcome model is correct (in
the sense that
PyRake defines estimators for each of these scenarios: the simplest
IPWEstimator; a stabilized variant, SIPWEstimator; an augmented
variant requiring an outcome model, AIPWEstimator; and a stabilized
augmented estimator, SAIPWEstimator. These estimators calculate
point estimates, the variance, Wald-style confidence intervals (point
estimate plus or minus some multiple of the square root of the
variance), and p-values against some hypothesized population mean.
Estimating the average treatment effect can be thought of as
estimating the population averages of two potential outcomes, and then
subtracting them. Given a population of units, some of which select
treatment (not necessarily at random), we can reweight outcomes in
that group to estimate the population mean of ATEEstimator for doing just that.
We may wish instead to estimate the average effect of treatment on the
units selecting treatment, or not selecting treatment. PyRake defines
ATTEstimator and ATCEstimator for these scenarios.
Sometimes the units being analyzed in a causal inference are themselves a sample from a population, and perhaps that sample is not representative of the population of interest. We may then have two sets of weights: for adjusting for non-random treatment selection, and to make the sample more representative of the target population. All of the treatment effect estimators in PyRake permit specifying sampling propensity scores to estimate population average effects.
The weights used to make the sample more closely resemble the target population, or to adjust for non-random treatment selection, are typically estimated, not known. They are subject to two distinct sources of uncertainty: haphazard and systematic. Haphazard uncertainty occurs any time we try to learn a relationship from a finite sample size, and can often be quantified with a bootstrap procedure. Systematic uncertainty occurs when we do not observe all the relevant characteristics influencing the relationship. In a causal inference setting, these represent unobserved confounding, and lead to hidden biases because we typically would not know their extent. A similar phenomenon occurs in survey sampling: if important characteristics correlated with the outcome influence the sampling procedure, but are not included in the propensity score model, perhaps because they are unobserved, it biases the estimators described above.
The estimators in PyRake include methods to explore the sensitivity of estimates to these hidden biases, using the procedures from (Zhao, Small, and Bhattacharya, 2019). We can calculate a range of point estimates that reflect the uncertainty in the estimate due to errors in the weights, or expand confidence intervals to account for this additional uncertainty.
W. Edwards Deming and Frederick F. Stephan, "On a least squares adjustment of a sampled frequency table when the expected marginal totals are known" (1940). Annals of Mathematical Statistics.
Daniel G. Horvitz and Donovan J. Thompson, "A generalization of sampling without replacement from a finite universe" (1952). Taylor & Francis.
Jens Hainmuller, "Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies" (2012). Political Analysis.
Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya, "Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap" (2019). Oxford University Press.
José R. Zubizarreta, "Stable weights that balance covariates for estimation with incomplete outcome data." (2015). Journal of the American Statistical Association.
pip install .from pyrake import Rake, KLDivergence, EfficientFrontier
# Inputs: X (M x p), mu (p,), v (M,)
rake = Rake(
distance=KLDivergence(),
X=X,
mu=mu,
phi=2.0,
)
frontier = EfficientFrontier(rake)
res = frontier.trace()
res.plot()import pandas as pd
from pyrake import SIPWEstimator
df = pd.DataFrame(...)
estimator = SIPWEstimator(
propensity_scores=df["score"],
outcomes=df["outcome"],
)
estimator.point_estimate()
estimator.confidence_interval(alpha=0.10)
estimator.plot_sensitivity()I used ChatGPT to write the original commit (it did a pretty good job!)
We use poetry to manage dependencies. Test cases use pytest. We use black and ruff for formatting.
After cloning the repository, run poetry shell. That will create a
virtual environment. Then run poetry install --no-root. That will
install all the libraries needed to run the test cases (and the
package itself). Finally, run python -m pytest to run the test
cases.
We use both black and ruff as python linters. To check if the code is
properly formatted, use: python -m ruff check pyrake test and
python -m black --check pyrake/ test/.
Apache

