We study the impact of strategic behavior in labor markets characterized by algorithmic monoculture, where firms compete for a shared pool of applicants using a common algorithmic evaluation. In this setting, ``naive'' hiring strategies lead to severe congestion, as firms collectively target the same high-scoring candidates. We model this competition as a game with capacity-constrained firms and fully characterize the set of Nash equilibria. We demonstrate that strategic differentiation significantly outperforms naive selection, increasing social welfare for both firms and applicants. Specifically, the Price of Naive Selection (welfare gain from strategy) grows linearly with the number of firms, while the Price of Anarchy (efficiency loss from decentralization) approaches 1, implying that the decentralized equilibrium is nearly socially optimal. Finally, we analyze convergence, and we show that a simple sequential best-response process converges to the desired equilibrium. However, we show that firms generally cannot infer the key input needed to compute best responses, namely congestion for specific candidates, from their own historical data alone. Consequently, to realize the welfare gains of strategic differentiation, algorithmic platforms must explicitly reveal congestion information to participating firms.
AI agents increasingly operate in multi-agent environments where outcomes depend on coordination and miscoordination. We distinguish primary algorithmic monoculture—baseline action similarity—from strategic algorithmic monoculture, whereby agents adjust similarity in response to incentives. We implement a simple experimental design that cleanly separates these forces, and deploy it on human and large language model (LLM) subjects. LLMs exhibit high levels of baseline similarity (primary monoculture) and, like humans, they regulate it in response to coordination incentives (strategic monoculture). While LLMs coordinate extremely well on similar actions, they lag behind humans in sustaining heterogeneity when divergence is rewarded.
Standard methods for aligning large language models with human preferences learn from pairwise comparisons among sampled candidate responses and regularize toward a reference policy. Despite their effectiveness, the effects of sampling and reference choices are poorly understood theoretically. We investigate these effects through Identity Preference Optimization, a widely used preference alignment framework and show that proper instance-dependent sampling can yield stronger ranking guarantees, while skewed on-policy sampling can induce excessive concentration under structured preferences. We then analyze iterative alignment dynamics in which the learned policy feeds back into future sampling and reference policies, reflecting a common practice of model-generated preference data. We prove that these dynamics can exhibit persistent oscillations or entropy collapse for certain parameter choices, and characterize regimes that guarantee stability. Our theoretical insights extend to Direct Preference Optimization, indicating the phenomena we captured are common to a broader class of preference-alignment methods. Experiments on real-world preference data validate our findings.
Large Language Models (LLMs) exhibit diverse and stable risk preferences in economic decision tasks, yet the drivers of this variation are unclear. Studying 50 LLMs, we show that alignment tuning for harmlessness, helpfulness and honesty systematically increases risk aversion. A ten percent increase in ethics scores reduces risk appetite by two to eight percent. This induced caution persists against prompts and affects economic forecasts. Alignment therefore promotes safety but can dampen valuable risk taking, revealing a tradeoff risking suboptimal economic outcomes. Our framework provides an adaptable and enduring benchmark for tracking model risk preferences and this emerging tradeoff.