Important external papers

Models and datasets/corpora

Blenderbot 2.0 (Facebook): Komeili et al (2021). Internet-Augmented Dialogue Generation. (PDF)

GPT-4: OpenAI. (PDF) Note: TBA.

Wudao 2.0 (BAAI): Zou & Tang et al. (2021). Controllable Generation from Pre-trained Language Models via Inverse Prompting. (Note: As of July 2021, this is the latest Wudao 2.0 paper showing extract of WDC-Text. Full paper TBA.) (PDF)

Wudao 1.0 (BAAI): Yuan & Tang et al. (2021). WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models. (PDF)

PanGu Alpha (Huawei): Zeng et al (2021). PanGu-Alpha: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. (PDF)

The Pile v1 (EleutherAI): Gao et al. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. EleutherAI. (PDF)

Common Crawl: Dodge et al. (2021). Documenting the English Colossal Clean Crawled Corpus. (PDF)

GPT-3 (OpenAI): Brown et al. (2020). Language Models are Few-Shot Learners. OpenAI. (PDF)

GPT-2 (OpenAI): Radford et al. (2019). Language Models are Unsupervised Multitask Learners. OpenAI. (PDF)

GPT-1 (OpenAI): Radford et al. (2018). Improving Language Understanding by Generative Pre-Training. OpenAI. (PDF)

Fine-tuning (Howard): Howard & Ruder. (2018). Universal Language Model Fine-tuning for Text Classification (PDF)

Transformer (Google): Vaswani et al. (2017). Attention is all you need. Google. (PDF)

The Turing Test: Turing, A. M. (1950). Computing Machinery and Intelligence. Mind 49: 433-460. (PDF)


Inside OpenAI and Neuralink offices: Hao, K. (2020). The messy, secretive reality behind OpenAI’s bid to save the world. MIT Technology Review. (PDF)

Ethics and data quality guidance

Parrots: Bender et al (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? (Note: Banned by Google.) (PDF)

GPT-3 quality: Strickland, E. (2021). OpenAI’s GPT-3 Speaks! (Kindly Disregard Toxic Language). IEEE. (PDF)

GPT-J quality: HN discussion (2021). A discussion about GPT-J, Books3 creation, and the exclusion of datasets like Literotica and the US Congressional Record…(PDF) (Original HN link)

See also my 2021 paper: Integrated AI: Dataset quality vs quantity via bonum (GPT-4 and beyond).

Intergovernmental and governmental guidance

AI Ethics: WHO (2021). Ethics and governance of artificial intelligence for health. (PDF)

Australian Govt AI: (2021). Australia’s AI Action Plan: June 2021. (External PDF)

International AI Strategies: The team at hosts the most comprehensive list of all international AI strategies, from Australia to Vietnam. (External link)


Wudao usage agreement: BAAI (2021). Data Usage Agreement of Zhiyuan Enlightenment Platform. (Note: Translated to English.) (PDF)

Dr Alan D. Thompson is an Australian AI expert and consultant. He has held positions as chairman for Mensa International, consultant to GE and Warner Bros, and memberships with the IEEE and IET. He is open to major AI projects with intergovernmental organisations and impactful companies. Contact.

This page last updated: 23/Jul/2021.