Papers & Code
Published
-
Graphical models are widely used in diverse application domains to model the conditional dependencies amongst a collection of random variables. In this paper, we consider settings where the graph structure is covariate-dependent, and investigate a deep neural network- based approach to estimate it. The method allows for flexible functional dependency on the covariate, and fits the data reasonably well in the absence of a Gaussianity assumption. Theoretical results with PAC guarantees are established for the method, under assumptions commonly used in an Empirical Risk Minimization framework. The performance of the proposed method is evaluated on several synthetic data settings and benchmarked against existing approaches. The method is further illustrated on real datasets involving data from neuroscience and finance, respectively, and produces interpretable results.
-
In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general “balanceable” Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.
-
Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to extract the shared common structure that is embedded across them, as well as to identify the idiosyncrasies within individual ones. This paper introduces a Variational Autoencoder (VAE) based framework that jointly learns Granger-causal relationships amongst components in a collection of related-yet-heterogeneous dynamical systems, and handles the aforementioned task in a principled way. The performance of the proposed framework is evaluated on several synthetic data settings and benchmarked against existing approaches designed for individual system learning. The method is further illustrated on a real dataset involving time series data from a neurophysiological experiment and produces interpretable results.
-
Market impact is an important problem faced by large institutional investors and active market participants. In this paper, we rigorously investigate whether price trajectory data from the metaorder increases the efficiency of estimation, from the view of the Fisher information, which is directly related to the asymptotic efficiency of statistical estimation. We show that, for popular market impact models, estimation methods based on partial price trajectory data, especially those containing early trade prices, can outperform established estimation methods (e.g. VWAP-based) asymptotically. We discuss theoretical and empirical implications of such phenomenon, and how they could be readily incorporated into practice.
Quantitative Finance, 2024 PDF -
Structural discovery amongst a set of variables is of interest in both static and dynamic settings. In the presence of lead-lag dependencies in the data, the dynamics of the system can be represented through a structural equation model (SEM) that simultaneously captures the contemporaneous and temporal relationships amongst the variables, with the former encoded through a directed acyclic graph (DAG) for model identification. In many real applications, a partial ordering amongst the nodes of the DAG is available, which makes it either beneficial or imperative to incorporate it as a constraint in the problem formulation. This paper develops an algorithm that can seamlessly incorporate a priori partial ordering information for solving a linear SEM (also known as Structural Vector Autoregression) under a high-dimensional setting. The proposed algorithm is provably convergent to a stationary point, and exhibits competitive performance on both synthetic and real data sets.
-
Mixed-frequency data prediction tasks are pertinent in various application domains, in which one leverages progressively available high-frequency data to forecast/nowcast the low-frequency ones. Existing methods in the literature tailored to such tasks are mostly linear in nature; depending on the specific formulation, they largely rely on the assumption that the (latent) processes that govern the dynamics of the high- and low-frequency blocks of variables evolve at the same frequency, either the low or the high one. This paper develops a neural network-based multi-task shared-encoder-dual-decoder framework for joint multi-horizon prediction of both the low- and high-frequency blocks of variables, wherein the encoder/decoder modules can be either long short-term memory or transformer ones. It addresses forecast/nowcast tasks in a unified manner, leveraging the encoder–decoder structure that can naturally accommodate the mixed-frequency nature of the data. The proposed framework exhibited competitive performance when assessed on both synthetic data experiments and two real datasets of US macroeconomic indicators and electricity data.
-
Quantifying aleatoric uncertainty is a challenging task in machine learning. It is important for decision making associated with data-dependent un- certainty in model outcomes. Recently, many empirical studies in modeling aleatoric uncertainty under regression settings primarily rely on either a Gaussian likelihood or moment matching. However, the performance of these methods varies for different datasets whereas discussions on their theoretical guarantees are lacking. In this work, we investigate the theoretical aspects of these approaches and establish risk bounds for their esti- mates. We provide conditions that are sufficient to guarantee the PAC-learnability of the aleatoric uncertainty. The study suggests that the likelihood- and moment matching-based methods enjoy different types of guarantee in their risk bounds, i.e., they calibrate different aspects of the uncertainty and thus exhibit distinct properties in different regimes of the parameter space. Finally, we conduct empirical study which shows promising results and supports our theorems.
-
We consider identification of linear dynamical systems comprising of high-dimensional signals, where the output noise components exhibit strong serial, and cross-sectional correlations. Although such settings occur in many modern applications, such dependency structure has not been fully incorporated in existing approaches in the literature. In this paper, we explicitly incorporate the dependency structure present in the output noise through lagged values of the observed multivariate signals. We formulate a constrained optimization problem to identify the space spanned by the latent states, and the transition matrices of the lagged values simultaneously, wherein the constraints reflect the low rank nature of the state information, and the sparsity of the transition matrices. We establish theoretical properties of the estimators, and introduce an easy-to-implement computational procedure for empirical applications. The performance of the proposed approach, and the implementation procedure is evaluated on synthetic data, and compared with competing approaches, and further illustrated on a data set involving weekly stock returns of 75 US large financial institutions for the 2001–2017 period.
IEEE Transactions on Signal Processing (TSP), 2020 PDF -
A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables X and latent factors F, and a calibration equation that relates another set of observed variables Y with F and X. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators.
-
Dynamical systems comprising of multiple components that can be partitioned into distinct blocks originate in many scientific areas. A pertinent example is the interactions between financial assets and selected macroeconomic indicators, which has been studied at aggregate level—e.g. a stock index and an employment index—extensively in the macroeconomics literature. A key shortcoming of this approach is that it ignores potential influences from other related components (e.g. Gross Domestic Product) that may impact the system’s dynamics and structure and thus produces incorrect results. To mitigate this issue, we consider a multi-block linear dynamical system with Granger-causal ordering between blocks, wherein the blocks’ temporal dynamics are described by vector autoregressive processes and are influenced by blocks higher in the system hierarchy. We derive the maximum likelihood estimator for the posited model for Gaussian data in the high-dimensional setting based on appropriate regularization schemes for the parameters of the block components. To optimize the underlying non-convex likelihood function, we develop an iterative algorithm with convergence guarantees. We establish theoretical properties of the maximum likelihood estimates, leveraging the decomposability of the regularizers and a careful analysis of the iterates. Finally, we develop testing procedures for the null hypothesis of whether a block “Granger-causes” another block of variables. The performance of the model and the testing procedures are evaluated on synthetic data, and illustrated on a data set involving log-returns of the US S&P100 component stocks and key macroeconomic variables for the 2001–16 period.
Journal of Machine Learning Research (JMLR), 2017 PDF -
Analyzing multi-layered graphical models provides insight into understanding the conditional relationships among nodes within layers after adjusting for and quantifying the effects of nodes from other layers. We obtain the penalized maximum likelihood estimator for Gaussian multi-layered graphical models, based on a computational approach involving screening of variables, iterative estimation of the directed edges between layers and undirected edges within layers and a final refitting and stability selection step that provides improved performance in finite sample settings. We establish the consistency of the estimator in a high-dimensional setting. To obtain this result, we develop a strategy that leverages the biconvexity of the likelihood function to ensure convergence of the developed iterative algorithm to a stationary point, as well as careful uniform error control of the estimates over iterations. The performance of the maximum likelihood estimator is illustrated on synthetic data.
Preprints
-
Nowcasting arises in various application domains such as economics, finance and healthcare. For a dynamical system comprising of multiple time series, under a nowcasting setting, one has access to the most recent observed values corresponding to a (potentially arbitrary) subset of the series, and the goal is to leverage such partially observed information to optimally infer the currently unobserved values. In this work, we propose a nowcasting method leveraging neural state space models. The proposed method has a state space backbone which promotes interpretability and resonates with some widely adopted modeling frameworks in econometrics (such as the dynamic factor models), and it allows for flexible parameterization of the dynamics. With a VAE-based training pipeline, by masking the unobserved values properly, the method can readily process missing values, while simultaneously learn to consume guidance information. The effectiveness of the proposed method is demonstrated through synthetic data settings. Ongoing work involves applying the method to real-world macroeconomic nowcasting and illiquid bond pricing settings.
TBD; Preliminary version presented at 2025 NBER-NSF Time Series Conference, 2025+ Slides -
Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning to DLMs remains an open challenge because of the intractable likelihood. Pioneering work such as diffu-GRPO (Zhao et al., 2025) estimated token-level likelihoods via one-step unmasking. While computationally efficient, this approach is severely biased. A more principled foundation lies in sequence-level likelihoods, where the evidence lower bound (ELBO) serves as a surrogate. Yet, despite this clean mathematical connection, ELBO-based methods have seen limited adoption due to the prohibitive cost of likelihood evaluation. In this work, we revisit ELBO estimation and disentangle its sources of variance. This decomposition motivates reducing variance through fast, deterministic integral approximations along a few pivotal dimensions. Building on this insight, we introduce Group Diffusion Policy Optimization (GDPO), a new RL algorithm tailored for DLMs. GDPO leverages simple yet effective Semi-deterministic Monte Carlo schemes to mitigate the variance explosion of ELBO estimators under vanilla double Monte Carlo sampling, yielding a provably lower-variance estimator under tight evaluation budgets. Empirically, GDPO achieves consistent gains over pretrained checkpoints and outperforms diffu-GRPO, one of the state-of-the-art baselines, on the majority of math, reasoning, and coding benchmarks.
Under Review, 2025+ PDF -
State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the tem- poral dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs, and presents a unified perspective for discrete time deep state space models and continuous time ones such as latent neural Ordinary Differential and Stochastic Differential Equations. It starts with an overview of the classical maximum likelihood based approach for learning SSMs, reviews variational autoencoder as a general learning pipeline for neural network-based approaches in the presence of latent variables, and discusses in detail representative deep learning models that fall under the SSM framework. Very recent developments, where SSMs are used as standalone architectural modules for improving efficiency in sequence modeling, are also examined. Finally, examples involving mixed frequency and irregularly-spaced time series data are presented to demonstrate the advantage of SSMs in these settings.
Major Revision, 2025+ PDF -
For many complex simulation tasks spanning areas such as healthcare, engineering, and finance, Monte Carlo (MC) methods are invaluable due to their unbiased estimates and precise error quantification. Nevertheless, MC simulations often become computationally prohibitive, especially for nested, multi-level, or path-dependent evaluations lacking effective variance reduction techniques. While machine learning (ML) surrogates appear as natural alternatives, naïve replacements typically introduce unquantifiable biases. We address this challenge by introducing Prediction-Enhanced Monte Carlo (PEMC), a framework that leverages modern ML models as learned predictors, using cheap and parallelizable simulations as features, to output unbiased evaluations with reduced variance and runtime. As a result, PEMC eliminates the closed-form-mean requirement that constrains classical control-variate methods, preserves unbiasedness and explicit confidence intervals of MC, and achieves scheme-wide variance reduction. Our theoretical analysis quantifies the optimal allocation between expensive evaluations and large batches of cheap, parallelizable feature draws. Across three representative applications—variance-swap pricing under stochastic-local-volatility, swaption pricing under Heath–Jarrow–Morton models, and ambulance diversion policy evaluation—we show that PEMC reduces root-mean-squared error by 30–55% relative to standard MC at similar computational cost.
Major Revision, 2025+ PDF
Miscellaneous
-
PaperBot: Chat With My Papers Code
Here’s a small, lightweight PaperBot I built — a fun experiment that lets me chat with my own research papers using an LLM + RAG pipeline.
It’s more of a playful prototype than a production tool — a weekend just-for-fun thingy that I’m still tweaking and improving to make it behave a little better.