Proposed Debiasing Method for Inferring High-Dimensional Linear Models (Example Code via Debias-Infer)
[1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
import scipy.stats
from sklearn.linear_model import LogisticRegressionCV
from sklearn.neural_network import MLPClassifier
from sklearn.calibration import CalibratedClassifierCV
from debiasing.debias_prog import ScaledLasso, DebiasProg, DualObj, DualCD, DebiasProgCV
Simulate Random Sample
We generate the i.i.d. data \(\{(Y_i,R_i,X_i)\}_{i=1}^n \subset \mathbb{R}\times \{0,1\} \times\mathbb{R}^d\) from the following linear model
where \(d=1000\) and \(n=900\).
We adopt the circulant symmetric matrix \(\Sigma^{\mathrm{cs}}\) in Javanmard and Montanari (2014) that is defined as \(\Sigma_{jj}=1\), \(\Sigma_{jk}=0.1 \text{ when } j+1\leq k\leq j+5 \text{ or } j+d-5\leq k \leq j+d-1\) with \(\Sigma_{jk}=0\) elsewhere for \(j\leq k\), and \(\Sigma_{jk}=\Sigma_{kj}\).
The true regression coefficient is \(\beta_0^{sp}=\left(\underbrace{\sqrt{5},...,\sqrt{5}}_5,0,...,0\right)^T \in \mathbb{R}^d\), and the query point is \(x=\left(1,\frac{1}{2},\frac{1}{4},0,0,0,\frac{1}{2},\frac{1}{8},0,...,0\right)^T\in\mathbb{R}^d\) to infer the joint effects of a few components of \(\beta_0\).
[2]:
d = 1000
n = 900
# Circulant symmetric covariance
Sigma = np.zeros((d,d)) + np.eye(d)
rho = 0.1
for i in range(d):
for j in range(i+1, d):
if (j < i+6) or (j > i+d-6):
Sigma[i,j] = rho
Sigma[j,i] = rho
sig = 1
# True regression coefficient
s_beta = 5
beta_0 = np.zeros((d,))
beta_0[:s_beta] = np.sqrt(5)
X_sim = np.random.multivariate_normal(mean=np.zeros(d), cov=Sigma, size=n)
eps_err_sim = sig*np.random.randn(n)
Y_sim = np.dot(X_sim, beta_0) + eps_err_sim
[3]:
# Query point
x = np.zeros((d,))
x[0] = 1
x[1] = 1/2
x[2] = 1/4
x[6] = 1/2
x[7] = 1/8
# True regression function
m_true = np.dot(x, beta_0)
To increase the complexity of estimating the propensity score, we generate the missing indicators \(R_i,i=1,...,n\) for the outcome variables \(Y_i,i=1,...,n\) as:
where \(\Phi(\cdot)\) is the CDF of \(\mathcal{N}(0,1)\) and the vector \((Z_{i1},...,Z_{iK})\) contains all polynomial combinations of the first eight components \(X_{i1},...,X_{i8}\) of the covariate vector \(X_i\) with degrees less than or equal to two (i.e., including the linear, quadratic, and one-way interaction terms).
[4]:
# MAR
inter_mat = PolynomialFeatures(degree=2, interaction_only=False,
include_bias=False).fit_transform(X_sim[:,:8])
obs_prob = scipy.stats.norm.cdf(-4 + np.sum(inter_mat, axis=1))
R = np.ones((n,))
R[np.random.rand(n) >= obs_prob] = 0
Detailed Procedures of Applying Our Proposed Debiasing Method
1. Lasso pilot estimate:
To select the regularization parameter \(\lambda>0\) in a data adaptive way, we adopt the scaled Lasso (Sun and Zhang, 2012) with its universal regularization parameter \(\lambda_0=\sqrt{\frac{2\log d}{n}}\) as the initialization. Specifically, it provides an iterative algorithm to obtain the Lasso estimate \(\hat{\beta}\) and a consistent estimator \(\hat{\sigma}_{\epsilon}\) of the noise level \(\sigma_{\epsilon}\) from the following jointly convex optimization problem:
where the regularization parameter \(\tilde{\lambda} >0\) is updated iteratively as well.
[5]:
# Lasso pilot estimator
beta_pilot, sigma_hat = ScaledLasso(X=X_sim[R == 1,:], Y=Y_sim[R == 1], lam0='univ')
2. Propensity Score Estimation:
We consider both the oracle/true propensity scores and the estimated propensity scores by the Lasso-type logistic regression, two-layer neural network (with and without the Platt’s logistic calibration (Platt, 1999)).
[6]:
## Propensity score estimation (Oracle, LR, MLP neural network)
for non_met in ['Oracle', 'LR', 'NN', 'NNcal']:
if non_met == 'Oracle':
prop_score1 = obs_prob.copy()
MAE_prop = np.mean(abs(prop_score1 - obs_prob))
if non_met == 'LR':
zeta2 = np.logspace(-1, np.log10(300), 40)*np.sqrt(np.log(d)/n)
lr2 = LogisticRegressionCV(Cs=1/zeta2, cv=5, penalty='l1', scoring='neg_log_loss',
solver='liblinear', tol=1e-6, max_iter=10000).fit(X_sim, R)
prop_score2 = lr2.predict_proba(X_sim)[:,1]
MAE_prop = np.mean(abs(prop_score2 - obs_prob))
if non_met == 'NN':
lr2_NN = MLPClassifier(hidden_layer_sizes=(80,50,), activation='relu',
random_state=None, learning_rate='adaptive',
learning_rate_init=0.001, max_iter=1000).fit(X_sim, R)
prop_score3 = lr2_NN.predict_proba(X_sim)[:,1]
MAE_prop = np.mean(abs(prop_score3 - obs_prob))
if non_met == 'NNcal':
NN_base = MLPClassifier(hidden_layer_sizes=(80,50,), activation='relu', random_state=None,
learning_rate='adaptive', learning_rate_init=0.001, max_iter=1000)
lr2_NN = CalibratedClassifierCV(NN_base, method='sigmoid', cv=5).fit(X_sim, R)
prop_score4 = lr2_NN.predict_proba(X_sim)[:,1]
MAE_prop = np.mean(abs(prop_score4 - obs_prob))
3. Solving Our Debiasing Program
We solve for the weights through our debiasing program, where the tuning parameter \(\frac{\gamma}{n}>0\) is selected via 5-fold cross-validations.
[7]:
## Oracle
w_1se1, ll_1se1, gamma_n_1se1 = DebiasProgCV(X=X_sim.copy(), x=x, prop_score=prop_score1, gamma_lst=None,
cv_fold=5, cv_rule='1se')
# w_minfeas1, ll_minfeas1, gamma_n_minfeas1 = DebiasProgCV(X=X_sim.copy(), x=x, prop_score=prop_score1,
# gamma_lst=None, cv_fold=5, cv_rule='minfeas')
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
[8]:
## LR
w_1se2, ll_1se2, gamma_n_1se2 = DebiasProgCV(X=X_sim.copy(), x=x, prop_score=prop_score2, gamma_lst=None,
cv_fold=5, cv_rule='1se')
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
[9]:
## NN
w_1se3, ll_1se3, gamma_n_1se3 = DebiasProgCV(X=X_sim.copy(), x=x, prop_score=prop_score3, gamma_lst=None,
cv_fold=5, cv_rule='1se')
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The coordinate descent algorithm has reached its maximum number of iterations: 5000! Reiterate one more times without small perturbations to the scaled design matrix...
The strong duality between primal and dual programs does not satisfy when \gamma/n=0.051!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
[10]:
## NNcal
w_1se4, ll_1se4, gamma_n_1se4 = DebiasProgCV(X=X_sim.copy(), x=x, prop_score=prop_score4, gamma_lst=None,
cv_fold=5, cv_rule='1se')
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.001!
The primal debiasing program is infeasible!
The primal debiasing program for this fold of the data is not feasible when \gamma/n=0.026!
If the tuning parameter \(\frac{\gamma}{n}>0\) is pre-selected, then we can run the primal/dual debiasing program without cross-validations.
[11]:
w_1se1_new = DebiasProg(X=X_sim.copy(), x=x, Pi=np.diag(prop_score1), gamma_n=gamma_n_1se1)
ll_1se1_new = DualCD(X=X_sim.copy(), x=x, Pi=np.diag(prop_score1), gamma_n=gamma_n_1se1, ll_init=None,
eps=1e-9, max_iter=5000)
4. Construct the Debiased Estimator and its Asymptotic 95% Confidence Interval
[13]:
print('The true regression function is '+str(m_true)+'.\n')
## Oracle
m_deb1 = np.dot(x, beta_pilot) + np.sum(w_1se1 * R * (Y_sim - np.dot(X_sim, beta_pilot)))/np.sqrt(n)
asym_sd1 = np.sqrt(np.sum(prop_score1 * w_1se1**2)/n)
print('The 95% confidence interval of our debiasing method under the oracle propensity scores is '\
+str([m_deb1 - asym_sd1*sigma_hat*scipy.stats.norm.ppf(1-0.05/2),
m_deb1 + asym_sd1*sigma_hat*scipy.stats.norm.ppf(1-0.05/2)])+'.\n')
## LR
m_deb2 = np.dot(x, beta_pilot) + np.sum(w_1se2 * R * (Y_sim - np.dot(X_sim, beta_pilot)))/np.sqrt(n)
asym_sd2 = np.sqrt(np.sum(prop_score2 * w_1se2**2)/n)
print('The 95% confidence interval of our debiasing method when the oracle propensity scores are estimated by '\
'the Lasso-type logistic regression is \n'\
+str([m_deb2 - asym_sd2*sigma_hat*scipy.stats.norm.ppf(1-0.05/2),
m_deb2 + asym_sd2*sigma_hat*scipy.stats.norm.ppf(1-0.05/2)])+'.\n')
## NN
m_deb3 = np.dot(x, beta_pilot) + np.sum(w_1se3 * R * (Y_sim - np.dot(X_sim, beta_pilot)))/np.sqrt(n)
asym_sd3 = np.sqrt(np.sum(prop_score3 * w_1se3**2)/n)
print('The 95% confidence interval of our debiasing method when the oracle propensity scores are estimated by '\
'the two-layer neural network is \n'\
+str([m_deb3 - asym_sd3*sigma_hat*scipy.stats.norm.ppf(1-0.05/2),
m_deb3 + asym_sd3*sigma_hat*scipy.stats.norm.ppf(1-0.05/2)])+'.\n')
## NNcal
m_deb4 = np.dot(x, beta_pilot) + np.sum(w_1se4 * R * (Y_sim - np.dot(X_sim, beta_pilot)))/np.sqrt(n)
asym_sd4 = np.sqrt(np.sum(prop_score4 * w_1se4**2)/n)
print("The 95% confidence interval of our debiasing method when the oracle propensity scores are estimated by "\
"the two-layer neural network with Platt's logistic calibration is \n"\
+str([m_deb4 - asym_sd4*sigma_hat*scipy.stats.norm.ppf(1-0.05/2),
m_deb4 + asym_sd4*sigma_hat*scipy.stats.norm.ppf(1-0.05/2)])+'.\n')
The true regression function is 3.913118960624632.
The 95% confidence interval of our debiasing method under the oracle propensity scores is [3.8227718343812698, 3.9852952726087865].
The 95% confidence interval of our debiasing method when the oracle propensity scores are estimated by the Lasso-type logistic regression is
[3.8531350930448305, 4.029273820154903].
The 95% confidence interval of our debiasing method when the oracle propensity scores are estimated by the two-layer neural network is
[3.824591171878295, 3.9872310918920717].
The 95% confidence interval of our debiasing method when the oracle propensity scores are estimated by the two-layer neural network with Platt's logistic calibration is
[3.8490969609870653, 4.02376413689852].