Gary Qiurui Ma
Ph.D .in Computer Science , Harvard University
qiurui_ma [at]  g.harvar.edu

LinkLinkedInTwitterGitHub

I am a third year PhD Student at Harvard University, where I am fortunate to be advised by Prof. David C. Parkes and Prof. Yannai A. Gonczarowski. Prior to Harvard, I earned my undergraduate degree in computer science department from HKUST, and spent a year in the University of Michigan, Ann Arbor where I was advised by Prof. Michael Wellman

My research lies at the intersection of game theory, computer science and economics. I love using theoretical computer science frameworks like graph theory and stable matching theory to analyze economic systems. When the system is too complex to analyze theoretically, I turn to AI and algorithmic ways to simulate the outcome. A topic that particular interests me is the platform economy, where a digital platform mediates an online buyer-seller market.  

Papers and Publications  (* for equal contribution)

D’Amico-Wong, L.*, Ma, G. Q.*, & Parkes, D.C. (2024). Strategic Recommendation: Revenue Optimal Matching for Online Platforms, In AAAI-24 Student Abstract and Poster, 3min presentation

We consider a platform in a two-sided market with unitsupply sellers and unit-demand buyers. Each buyer can transact with a subset of sellers it knows off platform and another seller that the platform recommends. Given the choice of sellers, transactions and prices form a competitive equilibrium. The platform selects one seller for each buyer, and takes a fixed percentage of the prices of all transactions that it recommends. The platform seeks to maximize total revenue. We show that the platform’s problem is NP-hard, even when each buyer knows at most two buyers off platform. Finally, when each buyer values all sellers equally and knows only one buyer off platform, we provide a polynomial time algorithm that optimally solves the problem.

Eden, A. *, Ma, G. Q. *, & Parkes, D. C. (2023). Platform Equilibrium: Analyzing Social Welfare in Online Market Places, Working Paper

We introduce the theoretical study of a Platform Equilibrium. There are unit-demand buyers and unit-supply sellers. Each seller chooses to join or remain off a trading platform, and each buyer can transact with any on-platform seller but only a subset of different offplatform sellers. Given the choices of sellers, prices form a competitive equilibrium, and clear the market considering constraints on trade. Further, the platform charges a transaction fee to all on-platform sellers, in the form of a fraction of on-platform sellers’ price. The platform chooses the fraction α ∈ [0, 1] to maximize revenue. A Platform Equilibrium is a Nash equilibrium of the game where each seller decides whether or not to join the platform, balancing the effect of a possibly larger trading network against the imposition of a transaction fee.
Our main insights are:(i) In homogeneous (identical) good markets, pure equilibria always exist and can be found by a polynomial time algorithm; (ii) When the platform is unregulated, the resulting Platform Equilibrium guarantees a tight Θ(log(min{m, n}))-approximation of the ideal welfare in homogeneous good markets; (iii) Even light regulation helps: when the platform’s fee is capped at α, the price of anarchy is (2 − α)/(1 − α) for general unit demand valuations. For example, if the platform takes 30% of the seller’s revenue, a rather high fee, our analysis implies the welfare in a Platform Equilibrium is still a 0.412-fraction of the ideal welfare. Some of our analysis extends beyond unit-demand buyers as well as to markets where sellers have production costs.

Wang, X., Ma, G. Q., Eden, A., Li, C., Trott, A., Zheng, S., & Parkes, D. (2023, April). Platform Behavior under Market Shocks: A Simulation Framework and Reinforcement-Learning Based Study, In Proceedings of the ACM Web Conference 2023 (pp. 3592-3602). 

We study the behavior of an economic platform (e.g., Amazon, Uber Eats, Instacart) under shocks, such as COVID-19 lockdowns, and the effect of different regulation considerations imposed on a platform. To this end, we develop a multi-agent Gym environment of a platform economy in a dynamic, multi-period setting, with the possible occurrence of economic shocks. Buyers and sellers are modeled as economically-motivated agents, choosing whether or not to pay corresponding fees to use the platform. We formulate the platform’s problem as a partially observable Markov decision process, and use deep reinforcement learning to model its fee setting and matching behavior. We consider two major types of regulation frameworks: (1) taxation policies and (2) platform fee restrictions, and offer extensive simulated experiments to characterize regulatory tradeoffs under optimal platform responses. Our results show that while many interventions are ineffective with a sophisticated platform actor, we identify a particular kind of regulation—fixing fees to optimal, pre-shock fees while still allowing a platform to choose how to match buyer demands to sellers—as promoting the efficiency, seller diversity, and resilience of the overall economic system.

Wang, Y., Ma, Q., & Wellman, M. P. (2022, May). Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis, In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (pp. 1346-1354). 

In empirical game-theoretic analysis (EGTA), game models are extended iteratively through a process of generating new strategies based on learning from experience with prior strategies. The strategy exploration problem in EGTA is how to direct this process so to construct effective models with minimal iteration. A variety of approaches have been proposed in the literature, including methods based on classic techniques and novel concepts. Comparing the performance of these alternatives can be surprisingly subtle, depending sensitively on criteria adopted and measures employed. We investigate some of the methodological considerations in evaluating strategy exploration, defining key distinctions and identifying a few general principles based on examples and experimental observations. In particular, we emphasize the fact that empirical games create a space of strategies that should be evaluated as a whole. Based on this fact, we suggest that the minimum regret constrained profile (MRCP) provides a particularly robust basis for evaluating a space of strategies, and propose a local search method for MRCP that outperforms previous approaches. However, the computation of MRCP is not always feasible especially in large games. In this scenario, we highlight consistency considerations for comparing across different approaches. Surprisingly, we find that recent works violate these considerations that are necessary for evaluation, which may result in misleading conclusions on the performance of different approaches. For proper evaluation, we propose a new evaluation scheme and demonstrate that our scheme can reveal the true learning performance of different approaches compared to previous evaluation methods.

Huang, J., Xie, S., Sun, J., Ma, Q., Liu, C., Lin, D., & Zhou, B. (2021, October). Learning a Decision Module by Imitation Driver's Control Behaviors, In Conference on Robot Learning (pp. 1-10). PMLR. 

Autonomous driving systems have a pipeline of perception, decision, planning, and control. The decision module processes information from the perception module and directs the execution of downstream planning and control modules. On the other hand, the recent success of deep learning suggests that this pipeline could be replaced by end-to-end neural control policies, however, safety cannot be well guaranteed for the data-driven neural networks. In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning. This hybrid framework can preserve the merits of the classical pipeline such as the strict enforcement of physical and logical constraints while learning complex driving decisions from data. To circumvent the ambiguous annotation of human driving decisions, our method learns high-level driving decisions by imitating low-level control behaviors. We show in the simulation experiments that our modular driving agent can generalize its driving decision and control to various complex scenarios where the rule-based programs fail. It can also generate smoother and safer driving trajectories than end-to-end neural policies. Demo and code are available at https://decisionforce.github.io/modulardecision/.

Workshops and Non-archival Presentations  (* for equal contribution)

Zhang, D.*, Ma, G. Q. *, Drew, B., & Sankararaman, S. (2019). TCA-TWAS: Identification of Cell-Type-Specific Genetic Regulation of Gene Expression for Transcriptome-Wide Association Studies (Poster, Code, Presentation, Report). In UCLA CSST Summer Research Program                                                                                    

We deconvolute tissue-level gene expression into cell-type specific ones with Tensor Component Analysis. We then perform Transciptome-wide Association Studies (TWAS) on the cell-type specific gene expression obtained on the UKBiobank data. Optimizing TWAS parameter estimation procedure, we can enforce sparsity on SNPs effect sizes, and enforce correlation on heritability and genetics.

Events

Teachings

2023 Fall, Teaching Fellow for CS136  Economics and Computation, Harvard
2023 Fall, Teaching Fellow for Data-Driven Marketing, Harvard Business Analytical Program