Research Papers
Core Architecture Papers
➙ "Attention Is All You Need" (Google Brain, Vaswani et al., 2017)
➙ "BERT: Pre-training of Deep Bidirectional Transformers" (Google AI Language, Devlin et al., 2018)
➙ "Language Models are Few-Shot Learners" (OpenAI, Brown et al., 2020)
➙ "LLaMA: Open and Efficient Foundation Language Models" (AI at Meta, Touvron et al., 2023)
Scaling & Efficiency Papers
➙ "Scaling Laws for Neural Language Models" (OpenAI Kaplan et al., 2020)
➙ "Training Compute-Optimal Large Language Models" (Google DeepMind Hoffmann et al., 2022)
➙ "Flash Attention: Fast and Memory-Efficient Exact Attention" (Stanford University Department of Computer Science, Dao et al., 2022)
➙ "Mixture of Experts with Expert Choice" (Google, Zhou et al., 2022)
➙ "Constitutional AI: Harmlessness from AI Feedback" (Anthropic, Bai et al., 2022)
➙ "Training Language Models to Follow Instructions" (OpenAI, Ouyang et al):
➙ "Learning to Summarize from Human Feedback" (OpenAI, Stiennon et al., 2020)
➙ "LoRA: Low-Rank Adaptation of Large Language Models" (Microsoft, Carnegie Mellon University, Hu et al., 2021)
➙ "Spectrum: Targeted Training on Signal to Noise Ratio" (Arcee.ai, Hartford et al., 2024):