Published
-
– Motivation: Diffusion language models support parallel, order-agnostic generation through iterative refinement, but RL fine-tuning remains difficult because their likelihoods are intractable. Existing token-level approximations, such as one-step unmasking in diffu-GRPO, are computationally efficient but can introduce severe bias.
– Method: We revisit sequence-level likelihood estimation for DLMs using the ELBO as a surrogate objective. By decomposing the variance of ELBO estimators, we identify key sources of instability and propose fast, deterministic integral approximations along pivotal dimensions.
– Theory and results: We introduce Group Diffusion Policy Optimization (GDPO), an RL algorithm for DLMs that uses semi-deterministic Monte Carlo schemes to reduce the variance of ELBO estimation under tight evaluation budgets. GDPO yields a provably lower-variance estimator than vanilla double Monte Carlo sampling and empirically improves over pretrained checkpoints and diffu-GRPO on most math, reasoning, and coding benchmarks.
-
– Motivation: Graphical models are widely used to describe conditional dependencies among random variables, but in many applications the dependency structure can change with observed covariates. This motivates methods that estimate covariate-dependent graph structures rather than a single static graph.
– Method: We propose a deep neural network-based approach for estimating covariate-dependent graphical models. The method flexibly models how the graph structure depends on covariates and can fit data well without relying on a Gaussianity assumption.
– Theory and results: We establish PAC guarantees under assumptions commonly used in empirical risk minimization. The method is evaluated on several synthetic settings, benchmarked against existing approaches, and illustrated on neuroscience and finance datasets, where it produces interpretable results.
-
– Motivation: Standard empirical risk minimization can yield overly broad guarantees that do not reflect improved behavior in favorable sub-regions of the data distribution, such as large-margin regions in classification or low-variance regions in heteroscedastic regression.
– Method: We study weighted empirical risk minimization, where a data-dependent weight function is incorporated into the empirical risk objective. Under a general balanceable Bernstein condition, we design weighted ERM estimators that improve performance in targeted sub-regions relative to standard ERM.
– Theory and results: We show that the improvement appears through a data-dependent constant in the error bound, yielding sharper conditional risk guarantees in the relevant sub-regions. The theoretical findings are supported by synthetic data experiments.
-
– Motivation: Granger causality is widely used to capture lead-lag relationships in complex dynamical systems, but most existing methods focus on a single system and often impose restrictive assumptions on the underlying dynamics. In applications such as macroeconomics and neuroscience, data are often collected from multiple related systems with nonlinear dynamics, where the goal is to recover both shared Granger-causal structure and system-specific idiosyncrasies.
– Method: We introduce a variational autoencoder-based framework for jointly learning Granger-causal relationships across a collection of related but heterogeneous dynamical systems with nonlinear dynamics. The framework models common structure across systems while allowing individual systems to retain their own distinctive connectivity patterns.
– Results and applications: The proposed method is evaluated on several synthetic settings and benchmarked against approaches designed for individual-system learning. It is further illustrated on neurophysiological time-series data, where it produces interpretable results.
-
– Motivation: Market impact is a central issue for large institutional investors and active market participants, since executing large orders can move prices and affect trading costs. Existing estimation methods often rely on aggregate summaries such as VWAP, potentially discarding useful information contained in the price trajectory during the metaorder.
– Method: We study whether incorporating metaorder price trajectory data improves estimation efficiency, using Fisher information as the main criterion because of its connection to asymptotic efficiency. The analysis compares trajectory-based estimators with established approaches such as VWAP-based estimation under popular market impact models.
– Theory and results: We show that estimators using partial price trajectory data can be asymptotically more efficient than standard alternatives, especially when early trade prices are included. We discuss the theoretical and empirical implications of this finding and explain how trajectory-based information can be incorporated into practical estimation procedures.
-
– Motivation: Structural discovery is important for understanding relationships among variables in dynamic systems with lead-lag dependencies. Such systems can be represented using structural equation models that capture both contemporaneous relationships through a DAG and temporal relationships through lagged effects. In many applications, partial ordering information among variables is available as a priori knowledge and should be incorporated into the model.
– Method: We develop an algorithm for estimating a high-dimensional linear structural equation model (SEM), also known as a structural vector autoregression, while incorporating a priori partial ordering constraints among the DAG nodes.
– Theory and results: The proposed algorithm is provably convergent to a stationary point and achieves competitive performance on both synthetic and real datasets.
-
– Motivation: Mixed-frequency prediction arises when progressively available high-frequency data are used to forecast or nowcast low-frequency variables. Existing methods are mostly linear and, depending on the formulation, typically assume that the latent processes for the high- and low-frequency variables evolve at one of the observed frequencies.
– Method: We develop a neural network-based multi-task framework with a shared encoder and dual decoders for joint multi-horizon prediction of both low- and high-frequency variables. The encoder and decoder modules can be implemented using either LSTM or transformer architectures, and the encoder–decoder structure naturally accommodates mixed-frequency observations.
– Results and applications: The framework treats forecasting and nowcasting in a unified way and achieves competitive performance on synthetic experiments and two real datasets involving US macroeconomic indicators and electricity data.
-
– Motivation: Quantifying aleatoric uncertainty is important for decision making when model outcomes depend on data-dependent noise. Existing regression methods often rely on Gaussian likelihoods or moment matching, but their empirical performance varies across datasets and their theoretical guarantees are not fully understood.
– Method: We study likelihood-based and moment-matching-based approaches for estimating aleatoric uncertainty in regression. We analyze their theoretical properties and establish risk bounds for the resulting uncertainty estimates.
– Theory and results: We provide sufficient conditions for PAC-learnability of aleatoric uncertainty. The analysis shows that likelihood-based and moment-matching-based methods calibrate different aspects of uncertainty, leading to different guarantees and behavior across parameter regimes; empirical results support the theoretical findings.
-
– Motivation: High-dimensional linear dynamical systems often exhibit output noise with strong serial and cross-sectional dependence. Existing system-identification methods do not fully account for this dependence, which can affect the recovery of the latent state structure.
– Method: We explicitly model the dependency in the output noise using lagged values of the observed multivariate signals. We formulate a constrained optimization problem to jointly estimate the latent state space and the transition matrices for the lagged observations, with constraints reflecting the low-rank structure of the latent states and sparsity of the transition matrices.
– Theory and results: We establish theoretical properties of the estimators and introduce an easy-to-implement computational procedure. The method is evaluated on synthetic data, compared with competing approaches, and applied to weekly stock returns of 75 large US financial institutions from 2001 to 2017.
-
– Motivation: FAVAR models are useful for studying how a small set of latent factors, extracted from many auxiliary variables, affect a set of core time series through a VAR system. However, standard FAVAR methods are not designed for settings where both the core time series and the auxiliary variables are high-dimensional.
– Method: We study the FAVAR model under high-dimensional scaling, introducing an identification constraint for the model parameters and incorporating it into the estimation problem to obtain statistically well-behaved estimates.
– Theory and results: We address technical challenges arising because the VAR parameters are estimated using latent factors that must first be estimated rather than directly observed. The proposed estimators are evaluated on synthetic data and applied to commodity prices, revealing interpretable relationships between commodity price dynamics and factors extracted from global macroeconomic indicators.
-
– Motivation: Many dynamical systems consist of groups of variables whose interactions are directional across groups. In financial systems, for example, stock returns may be influenced not only by market-level indicators such as a stock index or employment index, but also by other macroeconomic variables such as GDP. Ignoring these additional blocks can distort the inferred dynamics.
– Method: We propose a multi-block linear dynamical system with Granger-causal ordering between blocks, where each block follows a vector autoregressive process and is influenced by higher-level blocks. For Gaussian high-dimensional data, we derive a regularized maximum likelihood estimator and optimize the resulting non-convex likelihood using an iterative algorithm with convergence guarantees.
– Theory and results: We establish theoretical properties of the estimator by using decomposable regularizers and carefully analyzing the algorithm’s iterates. We also develop tests for block-level Granger causality, evaluating the method on synthetic data and applying it to S&P 100 stock log-returns and macroeconomic variables from 2001–2016.
-
– Motivation: Multi-layered graphical models help characterize conditional relationships within each layer while adjusting for and quantifying the effects of nodes from other layers.
– Method: We propose a penalized maximum likelihood estimator for Gaussian multi-layered graphical models, using variable screening, iterative estimation of between-layer directed edges and within-layer undirected edges, followed by refitting and stability selection to improve finite-sample performance.
– Theory and results: We establish high-dimensional consistency by exploiting the biconvexity of the likelihood to ensure convergence to a stationary point and by controlling estimation error uniformly over iterations; synthetic experiments illustrate the estimator’s performance.
Preprints
-
– Scope: State-space models provide a general framework for studying dynamical systems by representing temporal dependence through latent states that generate the observations. This review focuses on recent deep learning-based approaches for SSMs, covering both discrete-time deep state-space models and continuous-time formulations such as latent neural ODEs and SDEs.
– Coverage: We review classical maximum likelihood learning for SSMs, variational autoencoders as a general learning pipeline for latent-variable models, and representative deep SSM formulations. We also discuss recent developments where SSMs are used as standalone architectural modules for efficient sequence modeling, and illustrate the advantages of SSMs in mixed-frequency and irregularly spaced time-series settings.
-
– Motivation: Monte Carlo methods are widely used in complex simulation tasks because they provide unbiased estimates and explicit error quantification, but they can become computationally prohibitive for nested, multi-level, or path-dependent evaluations. Machine learning surrogates can reduce cost, but naïvely replacing MC with ML often introduces biases that are difficult to quantify.
– Method: We introduce Prediction-Enhanced Monte Carlo (PEMC), a framework that uses modern ML models as learned predictors while retaining the unbiasedness and confidence intervals of standard MC. By using cheap, parallelizable simulations as features, PEMC removes the closed-form-mean requirement in classical control variates and provides scheme-wide variance reduction.
– Theory and results: We analyze the optimal allocation between expensive evaluations and large batches of cheap feature draws. Across variance-swap pricing, swaption pricing, and ambulance diversion policy evaluation, PEMC reduces root-mean-squared error by 30–55% relative to standard MC at comparable computational cost.
-
– Motivation: Nowcasting is important in domains such as economics, finance, and healthcare, where a dynamical system contains multiple time series but only a subset may be observed at the current time. The goal is to use the most recent partially observed information to infer the currently unobserved values.
– Method: We propose a nowcasting method based on neural state-space models. The model uses a state-space backbone that supports interpretability and connects to widely used econometric frameworks such as dynamic factor models, while allowing flexible neural parameterization of the system dynamics.
– Results and applications: Using a VAE-based training pipeline with appropriate masking, the method can handle missing values and learn to incorporate guidance information. Its effectiveness is demonstrated on synthetic data, with ongoing applications to macroeconomic nowcasting and illiquid bond pricing.
Miscellaneous
-
This is a small, lightweight PaperBot I built — a fun experiment that lets me chat with my own research papers using an LLM + RAG pipeline.
It’s more of a playful prototype than a production tool — a weekend project that I’m occasionally tweaking and improving to make it behave a little better.