OpenAI:s Data Strategy: A dive into ChatGPT’s Business Strategy

This paper was developed during the course AI and Data Strategy (VT25, 5 ECTS). It was part of my academic work at Halmstad University in spring 2025.

I analysed OpenAI’s transformation and how it evolved from an open-source non-profit to a commercial AI powerhouse in the center of the AI ecosystem, as a result of their data strategy. I explored this change through the lens of data strategy and appropriability. Using case-based research, I examined how OpenAI leverages public data to capture value. They also utilize user-generated and partner data. This strategy impacts innovation, transparency, and fairness in AI ecosystems.

Strategic Use of Data in Generative AI

OpenAI transitioned from a nonprofit advocating open access to becoming a powerful data and model provider. Using scraped web content, user prompts, and now paid partnerships, it built its LLMs—especially GPT-4 and GPT-5—on a complex mix of public and proprietary data.

As of 2025, OpenAI is no longer just a non profit or model provider. It operates as infrastructure in the AI economy. Its APIs power Microsoft 365 Copilot, Apple integrations, and hundreds of startups. OpenAI basically created a layer on top of the open innovation, the internet.

Appropriability and Ecosystem Control

Inspired by Teece’s (1986) concept of appropriability, the paper shows how OpenAI:

Initially exploited non-rivalrous public data.
Later enforced control through API layers, licensing deals, and technical opacity.
Uses exclusivity, infrastructure, and scale as barriers to entry—creating a ”kill zone” for AI startups that lack equivalent data or compute access.

Legal and Ethical Tensions

The article addresses legal cases (e.g. New York Times vs. OpenAI) and introduces the information paradox: It raises questions around:

Copyright vs. fair use in LLM training
The opacity of model provenance
How even user prompts can be repurposed for further training

Frameworks and Theories Applied

Appropriability Regimes (Teece, 1986)
Data as Infrastructure vs. Property (Mayer-Schönberger & Ramge, 2022)
Information Paradox (Burstein)

Why This Matters

This project showcases my ability to:

Translate complex theoretical concepts into real-world analysis.
Evaluate how data access, regulation, and platform strategy shape AI innovation.
Write about emerging tech with a balance of research, strategy, and communication.

Ideal for roles at the intersection of service design, UX strategy, digital policy, or innovation management.

👉 [Contact me or read the full article about OpenAI:s Data Strategy on my blog]

Strategic Use of Data in Generative AI

Appropriability and Ecosystem Control

Legal and Ethical Tensions

Frameworks and Theories Applied

Why This Matters

Relaterade inlägg