I research how innovations move from discovery to market, studying why some technologies advance while others stall and how AI is transforming the search, evaluation, and commercialization of innovations. Most of my work centers on scientific discoveries, crucial for social progress.
Prior to academia, I founded a startup providing software and machine learning translation solutions for enterprise clients. I also spent years in M&A, leading acquisitions of innovative companies developing technologies such as advanced materials, biologics, energy generation, and water treatment.
I am on the 2025-2026 job market.
We develop an ex-ante measure of commercial potential of science, an otherwise unobservable variable driving the performance of innovation-intensive firms. To do so, we rely on LLMs and neural networks to predict whether scientific articles will influence firms' use of science. Incorporating time-varying models and the quantification of uncertainty, the measure is validated through both traditional methods and out-of-sample exercises, leveraging a major university’s technology transfer data. To illustrate the methodological contributions of our measure, we apply it to examining the impact of university reputation and university privatization of science, finding that firms’ reliance on reputation may lead to foregone opportunities, and privatization (i.e., patenting) appears to increase firms’ use of the science of one university. We make our measure and method available to researchers.
Startups commercializing science-based innovations are crucial for tackling pressing challenges, yet, in critical sectors such as energy, industrials, and materials, entrepreneurial activity remains limited. This paper investigates whether limited value capture at exit constrains these ventures. I estimate value creation and capture in startup acquisitions by combining acquisition prices with acquirer stock returns, adjusting for market noise to isolate the economic signal attributable to the acquisition. Science-based startups capture 46 cents per dollar of acquisition-induced surplus, compared to 61 cents for non-science startups—a 24% penalty. Conversely, they create 20% more joint surplus, consistent with continued entry despite the capture penalty. To explain these patterns, I examine a central mechanism: the structure of a startup’s exit conditions. I argue that science-based startups face thinner, more concentrated acquisition markets and limited ability to scale independently, features that weaken the startup’s bargaining power. Indeed, I find that science-based startups face up to 40% fewer potential acquirers, who are 53% larger on average, and that their value capture is more sensitive to buyer concentration. Concentrated markets have a dual effect: large incumbents enable greater surplus creation, but also shift bargaining power away from startups, allowing acquirers to extract most of the gains from innovation. Finally, I find that the capture penalty diminishes when startups can scale commercialization independently. The results suggest that constrained exit environments limit returns to science-based entrepreneurship, highlighting the importance of competitive acquisition markets, markets for technologies, and alternative commercialization pathways in incentivizing upstream innovation.
Scientific innovation depends on the effective matching between discoveries and commercial applications, with intermediaries playing a crucial role in bridging this gap. We study digital platforms, a growing class of such intermediaries whose algorithms increasingly shape how scientific information is accessed and consumed. In principle, these could facilitate efficient matching by ensuring that the most relevant scientific information reaches potential users. However, in practice, intermediaries also face incentives to monetize user demand, which can distort this matching process. We formalize this tension in a simple model and test its predictions using Google Search, measuring traffic acquisition to an online platform providing scientific knowledge. In a field experiment with randomized advertisements, we show that Google's search engine systematically inserts ads under high-incentive conditions and, importantly, that this behavior reduces the visibility and subsequent consumption of relevant scientific information. Our findings highlight how algorithmic intermediation can undermine efficient matching when revenue incentives are strong. The findings suggest that search frictions are not merely technical problems to be solved, but economic outcomes to be governed. This highlights the challenge of designing innovation intermediaries, from technology transfer offices to emerging AI platforms, in ways that align private incentives with the efficient diffusion of knowledge.
AI is increasingly used to generate predictive measures that guide consequential decisions across domains. While performance is well understood in areas such as credit scoring and hiring, there is limited evidence on how well these tools perform in evaluating scientific research and innovation—a domain where prediction can support the identification of commercially promising discoveries. This setting presents unique challenges: scientific outputs are highly heterogeneous and growing rapidly; signals of downstream value are often weak, delayed, and noisy; and positive outcomes are rare. In this paper, we evaluate the accuracy of different AI-based approaches in forecasting downstream outcomes from early-stage research in the context of biomedical science. We compare three classes of models: (i) generative models, including both prompt-based and retrieval-augmented variants; (ii) general-purpose supervised models trained on cross-domain data; and (iii) domain-specific supervised models using in-domain representations and tailored features. We find that the best-performing approach is the supervised, domain-specific model, followed by the general-purpose supervised model. Surprisingly, despite recent advances in large generative models, both OpenAI’s GPT-5 and Meta’s open-source Llama 4 perform notably worse. We argue this reflects a mismatch between generative training objectives and predictive tasks, as well as limited sensitivity to technical and translational cues. These findings highlight both the promise and the limitations of current AI approaches in evaluating early-stage research, and offer guidance for when specialization is likely to matter.
Crafting high-quality ideas is crucial for entrepreneurs to succeed, yet evidence about the factors that shape the idea-generation process is scarce. A long-standing question is whether differences across entrepreneurs in market judgment—the ability to evaluate business ideas—explain differences in ideas’ quality and composition. We conduct an experiment with an intervention that improves subjects’ ability to evaluate an idea’s market potential, finding that improved judgment leads subjects to generate ideas 15% higher in quality and more complete, with stronger effects among initially poorly-calibrated subjects. Our results support a potential mechanism: individuals with developed judgment mentally test more ideas and better filter them before committing to one. Simple training can improve judgment and idea quality, complementing ex-post, experimental methods by reducing the costs of testing ideas.
There is growing concern that a large share of scientific discoveries produced at research institutions never reach practical use, but the scale and distribution of this gap have not been systematically mapped. We quantify the extent to which U.S. research outputs with the potential to inform commercially relevant technologies fail to move toward application. Using artificial intelligence, we identify high-potential discoveries at the time of publication and track whether they are later taken up in downstream technologies. We find that realization rates vary widely across institutions, researcher profiles, and regional ecosystems. High-potential discoveries are realized more than twice as often at top-tier R1 universities compared to lower-tier ones. Likewise, senior researchers and those located in regions with strong commercialization infrastructure see significantly higher translation rates. These patterns show that high-potential science often stalls—not because it lacks economic promise, but because the conditions needed to realize it are uneven.
We developed scientifiq.ai, an AI-based platform for advancing research on innovation. It supports decision-makers in firms, policy organizations, universities, and other research institutions by integrating large-scale data on scientific publications, researchers, patents, grants, and related outputs with machine learning models and AI tools to help identify and evaluate emerging research and technologies. Equally important, it serves as infrastructure for studying the translation and commercialzation of science, enabling the development of novel datasets and methods for research. We gratefully acknowledge generous support from the Kauffman Foundation, NC Biotech, Duke University, and OpenAI.
This dataset provides machine-learning-based predictions of the commercial and scientific potential of over 5 million scientific articles published between 2000 and 2020 across 126 U.S. universities in applied sciences and engineering. Using text data, our models estimate the likelihood that each publication will be used in commercial innovation or contribute to future scientific research. The dataset supports research on how science translates into market and policy outcomes, and was made possible thanks to funding from the Kauffman Foundation and Duke University. It is available as a downloadable CSV via Zenodo and publicly accessible on BigQuery → nber-i3
.
Coming soon: measures for earlier years (from 1980) and global coverage of over 30 million publications.
This dataset provides refined measures of stock market reactions to over 100,000 acquisition announcements. Using a signal-extraction approach, it adjusts raw abnormal returns for market noise to generate cleaner estimates of the abnormal gains, or acquirer surplus, associated with each deal. For every acquisition, the dataset reports both raw announcement-window abnormal returns and refined measures, enabling systematic investigation of the distribution and determinants of acquisition gains across firms, industries, time periods, and transaction types. It offers large-scale, event-level data for empirical research in economics, finance, and related disciplines. This dataset was made possible through funding from Duke University (coming soon).
This dataset identifies startups engaged in the commercialization of science-based innovations by applying an open-source large language model—Meta’s Llama 3.3 70B—to diverse text sources, including company descriptions, press releases, funding announcements, and other materials. In contrast to traditional measures that rely primarily on patents and patent citations to scientific articles, the data provide a complementary view of how ventures' technologies are rooted in scientific research. Covering thousands of firms across industries, the dataset links firm-level identifiers with classifications of science reliance and field tags. It enables new research on the scale and scope of science-based entrepreneurship, the pathways through which scientific discoveries are commercialized, and the conditions that shape their outcomes in markets and society. This dataset was made possible through funding from Duke University (coming soon).