Methodology

Proposed Debiasing Inference Procedure

Consider the observed data \(\{(Y_i,R_i,X_i)\}_{i=1}^n \subset \mathbb{R}\times \{0,1\} \times \mathbb{R}^d\), where \(Y_i\) is the outcome variable that could be missing, \(R_i\) is the binary variable indicating the missingness of \(Y_i\), and \(X_i\) is the fully observed high-dimensional covariate vector for \(i=1,...,n\).

Our proposed debiasing method aims at conducting valid statistical inference on the linear regression function \(m_0(x)=\mathrm{E}(Y|X=x)=x^T\beta_0\) under the “missing at random (MAR)” assumption. Specifically, the method consists of the following procedures.

Step 1: Compute the Lasso pilot estimate \(\widehat{\beta}\) of the regression coefficient using the complete-case data as:

\[\widehat{\beta}=\mathrm{argmin}_{\beta \in \mathbb{R}^d} \left[\frac{1}{n}\sum\limits_{i=1}^n R_i (Y_i-X_i^T \beta)^2+ \lambda \|\beta\|_1 \right],\]

where \(\lambda >0\) is a regularization parameter.

Step 2: Obtain consistent estimates \(\widehat{\pi}_i, i=1,...,n\) of the propensity scores \(\pi_i = \mathrm{P}(R_i=1|X_i)\) by any machine learning method (not necessarily a parametric model) applied on the data \(\{(X_i,R_i)\}_{i=1}^n \subset \mathbb{R}^d \times \{0,1\}\). See the scikit-learn package and the related probability calibration for potential propensity score estimation methods.
Step 3: Solve for the debiasing weight vector \(\widehat{\mathbf{w}}\equiv \widehat{\mathbf{w}}(x) = \left(\widehat{w}_1(x),...,\widehat{w}_n(x)\right)^T \in \mathbb{R}^n\) through a debiasing program defined as:

\[\min_{\mathbf{w}\equiv \mathbf{w}(x) \in \mathbb{R}^n} \left\{\sum_{i=1}^n \widehat{\pi}_iw_i(x)^2: \left\|x- \frac{1}{\sqrt{n}}\sum_{i=1}^n w_i(x)\cdot \widehat{\pi}_i\cdot X_i \right\|_{\infty} \leq \frac{\gamma}{n} \right\},\]

where \(\gamma >0\) is a tuning parameter.

Step 4: Define the debiasing estimator for \(m_0(x)\) as:

\[\widehat{m}^{\text{debias}}(x;\widehat{\mathbf{w}}) = x^T \widehat{\beta} + \frac{1}{\sqrt{n}} \sum_{i=1}^n \widehat{w}_i(x)R_i \left(Y_i-X_i^T \widehat{\beta} \right).\]

Step 5: Construct the asymptotic \((1-\alpha)\)-level confidence interval for \(m_0(x)\) as:

\[\left[\widehat{m}^{\text{debias}}(x;\widehat{\mathbf{w}}) - \Phi^{-1}\left(1-\frac{\tau}{2}\right) \sigma_{\epsilon}\sqrt{\frac{1}{n}\sum_{i=1}^n \widehat{\pi}_i \widehat{w}_i(x)^2},\; \widehat{m}^{\text{debias}}(x;\widehat{\mathbf{w}}) + \Phi^{-1}\left(1-\frac{\tau}{2}\right) \sigma_{\epsilon} \sqrt{\frac{1}{n}\sum_{i=1}^n \widehat{\pi}_i \widehat{w}_i(x)^2} \right],\]

where \(\Phi(\cdot)\) denotes the cumulative distribution function (CDF) of \(\mathcal{N}(0,1)\). If \(\sigma_{\epsilon}^2\) is unknown, then it can be replaced by any consistent estimator \(\widehat{\sigma}_{\epsilon}^2\).

For the implementation of this package Debias-Infer, we fit the Lasso pilot estimate \(\widehat{\beta}\) in Step 1 by the scaled Lasso [2] so as to automatically select the regularization parameter \(\lambda >0\) and simultaneously produce a consistent estimator of the noise level \(\sigma_{\epsilon}^2\).

As for solving the debiasing program in Step 3, we leverage the CVXPY package. To select the tuning parameter \(\gamma >0\), we use the cross-validation on the dual formulation of the debiasing program as:

\[\min_{\ell(x) \in \mathbb{R}^d} \left\{\frac{1}{4n} \sum_{i=1}^n \widehat{\pi}_i \left[X_i^T \ell(x)\right]^2 + x^T \ell(x) +\frac{\gamma}{n}\|\ell(x)\|_1 \right\}.\]

This dual program is solved via the coordinate descent algorithm [3] in our package.

In the reference paper [1], we prove that the confidence interval in Step 5 is asymptotically valid and our debiased estimator in Step 4 is semi-parametrically efficient among all asymptotically linear estimators with MAR outcomes; see Figure 1 for how our debiased estimator performs (under two different rules for the cross-validation) when compared with the Lasso estimate.

Figure 1: Comparison of our debiased estimators under two different choices of the tuning parameters (“1SE” and “min-feas”) with the conventional Lasso estimates based on complete-case or oracle data.

Methodology

Proposed Debiasing Inference Procedure

References