Research Papers

Core Architecture Papers

➙ "Attention Is All You Need" (Google Brain, Vaswani et al., 2017)

➙ "BERT: Pre-training of Deep Bidirectional Transformers" (Google AI Language, Devlin et al., 2018)

➙ "Language Models are Few-Shot Learners" (OpenAI, Brown et al., 2020)

➙ "LLaMA: Open and Efficient Foundation Language Models" (AI at Meta, Touvron et al., 2023)

Scaling & Efficiency Papers

➙ "Scaling Laws for Neural Language Models" (OpenAI Kaplan et al., 2020)

➙ "Training Compute-Optimal Large Language Models" (Google DeepMind Hoffmann et al., 2022)

➙ "Flash Attention: Fast and Memory-Efficient Exact Attention" (Stanford University Department of Computer Science, Dao et al., 2022)

➙ "Mixture of Experts with Expert Choice" (Google, Zhou et al., 2022)

➙ "Constitutional AI: Harmlessness from AI Feedback" (Anthropic, Bai et al., 2022)

➙ "Training Language Models to Follow Instructions" (OpenAI, Ouyang et al):

➙ "Learning to Summarize from Human Feedback" (OpenAI, Stiennon et al., 2020)

➙ "LoRA: Low-Rank Adaptation of Large Language Models" (Microsoft, Carnegie Mellon University, Hu et al., 2021)

➙ "Spectrum: Targeted Training on Signal to Noise Ratio" (Arcee.ai, Hartford et al., 2024):

AI Circle

hello@ai-circle.org