Dynamic mechanism design provides a principled framework for coordinating self-interested agents whose private information evolves over time, but general solutions remain analytically intractable due to history dependence and dynamic incentive constraints. We propose a computational approach to optimal dynamic mechanism design based on an inverse game perspective: rather than characterizing equilibria induced by candidate mechanisms, we fix truthful reporting as the target equilibrium behavior and directly search for parameterized mechanisms that rationalize it.
Restricting attention to expressive direct mechanisms, we formulate the design problem as a bilevel optimization in which the principal maximizes her objective subject to incentive compatibility and individual rationality constraints. To verify the incentive compatibility constraint, we solve each agent’s unilateral deviation problem, which is modeled as a parameterized partially observable Markov decision process and reduced to a sufficient-statistics Markov decision process, enabling efficient approximation via stochastic natural policy gradient methods with finite-sample guarantees. This reduction accommodates both analytically derived sufficient statistics, such as Bayesian belief states, and learned representations, including recurrent neural embeddings.
We develop two computational frameworks—a direct penalty method and a binary-search-based acceptance method—and establish non-asymptotic convergence guarantees to approximate stationary solutions despite nonconvexity, nonsmooth incentive constraints, and oracle error. In dynamic bandit auction experiments, our approach recovers known single-item revenue benchmarks and discovers new approximately incentive-compatible mechanisms in multi-item settings where analytical solutions are unavailable.
We study the online resource allocation problem, where a seller sequentially receives independent requests for $m$ types of resources with limited supplies from $n$ heterogeneous agents arriving in an unknown order. Each request from an agent can be fulfilled in different ways, with resource consumption in $[0,1]^m$, and generates different values for the agent. The objective of the seller is to maximize the social welfare, which is the sum of the values obtained from each agent.
Recently, [Ghuge, Singla, and Wang STOC'25] studied the learnability of the online resource allocation problem with heterogeneous agents and proposed a learnable pricing algorithm using only a single sample. However, their core algorithm is a dynamic pricing algorithm, which may introduce fairness concerns, as different agents face different prices. Furthermore, the algorithm crucially needs to know the arrival order of the agents in advance. To address these issues, in this paper, we study the learnability of anonymous pricing algorithms for online resource allocation using samples and queries to agents’ value distributions. First, we show that a polynomial number of samples suffices to learn the classic dual pricing algorithm. Second, we show that a polynomial number of pricing queries suffices to learn a near-optimal anonymous pricing algorithm, in which the item pricing vector faced by each agent is drawn from the same predetermined distribution.
We provide the first analysis of (deferred acceptance) clock auctions in the learning-augmented framework. These auctions satisfy a unique list of very appealing properties, including obvious strategyproofness, transparency, and unconditional winner privacy, making them particularly well-suited for real-world applications. However, early work that evaluated their performance from a worst-case analysis perspective concluded that no deterministic clock auction with $n$ bidders can achieve a $O(\log^{1-\epsilon} n)$ approximation of the optimal social welfare for a constant $\epsilon>0$, even in very simple settings. This overly pessimistic impossibility result heavily depends on the assumption that the designer has no information regarding the bidders' values. Leveraging the learning-augmented framework, we instead consider a designer equipped with some (machine-learned) advice regarding the optimal solution; this advice can provide useful guidance if accurate, but it may be unreliable.
Our main results are learning-augmented clock auctions that use this advice to achieve much stronger performance guarantees whenever the advice is accurate (known as \emph{consistency}), while maintaining worst-case guarantees even if this advice is arbitrarily inaccurate (known as \emph{robustness}). Our first clock auction achieves the best of both worlds: $(1+\epsilon)$-consistency for any desired constant $\epsilon>0$ and $O(\log{n})$ robustness; we also extend this auction to achieve error tolerance. We then consider a much stronger notion of consistency, which we refer to as consistency$^\infty$, and provide an auction that achieves a near-optimal trade-off between consistency$^\infty$ and robustness. Finally, using our impossibility results regarding this trade-off, we prove lower bounds on the ``cost of smoothness,'' i.e., on the robustness that is achievable if we also require that the performance of the auction degrades smoothly as a function of the prediction error.