Yin, Yiqiao
Article History
Received: 17 May 2025
Accepted: 4 February 2026
First Online: 11 February 2026
Declarations
:
: The authors declare no competing interests.
: Warren Buffett Letters Dataset - Copyright Compliance. The Warren Buffett Letters dataset used in this research is derived from shareholder letters published by Berkshire Hathaway Inc. and made freely available to the public on their official website ( ). These letters are published annually for public dissemination without access restrictions. Our use of this material constitutes fair use under 17 U.S.C. 107 for the following reasons: (1) Transformative Purpose : We do not republish the original letters but create derivative question-answer-reasoning triplets for academic research in artificial intelligence and natural language processing, representing a fundamentally different purpose than the original corporate communication. (2) Limited Extraction : We extract small portions at the paragraph level to generate training samples, not entire letters. (3) Non-Commercial Research : This work is conducted for academic research and educational purposes. (4) No Market Substitution : Our dataset does not serve as a substitute for reading the original comprehensive letters; rather, it may increase interest in Buffett’s investment philosophy. Attribution and Licensing. The original shareholder letters are Berkshire Hathaway Inc., authored by Warren Buffett. Our derivative dataset includes proper attribution to the original author and source. The dataset is released for non-commercial academic research and educational purposes only under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). Users planning commercial applications should consult legal counsel regarding appropriate licensing. Full attribution and links to original source materials are provided in the dataset repository documentation. Responsible Use and Takedown Policy. We have established a clear takedown policy in our dataset documentation. Should Berkshire Hathaway Inc., Warren Buffett, or their authorized representatives express concerns about our use of this material, we commit to: (1) immediately removing the dataset from public repositories, (2) notifying users who have downloaded the dataset, and (3) updating this publication accordingly. Contact information for such requests is provided in the dataset repository. Precedent in Academic Research. Our approach follows established practices in computational finance and NLP research, where numerous academic datasets have been created from publicly available corporate documents, SEC filings, earnings calls, and annual reports for research purposes. Examples include the Financial PhraseBank dataset, ECTSum corpus of earnings call transcripts, and various SEC filing datasets used widely in the research community.