A social path to human-like artificial intelligence

Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. NeurIPS 25, 1097–1105 (2012).Deng, J. et al. Imagenet: a large-scale hierarchical picture database. IEEE Conf. Comput. Vis. Pattern Recog. 248–255 (2009).Kaplan, J. et al. Scaling legal guidelines for neural language fashions. Preprint at https://arXiv.org/abs/2001.08361 (2020).Bommasani, R. et al. On the alternatives and dangers of basis fashions. Preprint at https://arXiv.org/abs/2108.07258 (2021).Hoffmann, J. et al. Training compute-optimal giant language fashions. Preprint at https://arXiv.org/abs/2203.15556 (2022).Fei-Fei, L. & Krishna, R. Searching for laptop imaginative and prescient north stars. Daedalus 151, 85–99 (2022).Article

Google Scholar
Alayrac, J.-B. et al. Flamingo: a visible language mannequin for few-shot studying. Adv. NeurIPS 35, 23716–23736 (2022).
Google Scholar
Young, T. Experiments and calculations relative to bodily optics (The 1803 Bakerian lecture). Phil. Trans. R. Soc. 94, 1–16 (1804).Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).Schaul, T., Borsa, D., Modayil, J. & Pascanu, R. Ray interference: a supply of plateaus in deep reinforcement studying. Preprint at https://arXiv.org/abs/1904.11455 (2019).Ortega, P. A. et al. Shaking the foundations: delusions in sequence fashions for interplay and management. Preprint at https://arXiv.org/abs/2110.10819 (2021).Huang, J. et al. Large language fashions can self-improve. Preprint at https://arXiv.org/abs/2210.11610 (2022).Shumailov, I. et al. The curse of recursion: coaching on generated knowledge makes fashions neglect. Preprint at https://arXiv.org/abs/2305.17493 (2023).Wang, R., Lehman, J., Clune, J. & Stanley, Ok. O. Paired open-ended trailblazer (POET): endlessly producing more and more complicated and various studying environments and their options. Preprint at https://arXiv.org/abs/1901.01753 (2019).Portelas, R., Colas, C., Weng, L., Hofmann, Ok. & Oudeyer, P.-Y. Automatic curriculum studying for deep RL: a brief survey. Proc. twenty ninth International Joint Conference on Artificial Intelligence Survey Track (2020).Linke, C., Ady, N. M., White, M., Degris, T. & White, A. Adapting conduct through intrinsic reward: a survey and empirical research. J Artif. Intell. Res. 69, 1287–1332 (2020).Article
MathSciWeb

Google Scholar
Oudeyer, P.-Y. & Kaplan, F. What is intrinsic motivation? A typology of computational approaches. Front. Neurorobot. 1, 6 (2007).Article

Google Scholar
Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction. Proc. thirty fourth International Conference on Machine Learning 70, 2778–2787 (PMLR, 2017).Colas, C., Karch, T., Sigaud, O. & Oudeyer, P.-Y. Autotelic brokers with intrinsically motivated goal-conditioned reinforcement studying: A brief survey. J. Artif. Intell. Res. 74, 1159–1199 (2022).Article
MathSciWeb
MATH

Google Scholar
Ladosz, P., Weng, L., Kim, M. & Oh, H. Exploration in deep reinforcement studying: a survey. Inf. Fusion 85, 1–22 (2022).Jiang, M., Rocktäschel, T. & Grefenstette, E. General intelligence requires rethinking exploration. R. Soc. Open Sci. 10, 230539 (2023).Article

Google Scholar
Kearns, M. & Singh, S. Near-optimal reinforcement studying in polynomial time. Mach. Learn. 49, 209–232 (2002).Article
MATH

Google Scholar
Osband, I., Van Roy, B., Russo, D. J. & Wen, Z. Deep exploration through randomized worth features. J. Mach. Learn. Res. 20, 1–62 (2019).MathSciWeb
MATH

Google Scholar
Leibo, J. Z., Hughes, E., Lanctot, M. & Graepel, T. Autocurricula and the emergence of innovation from social interplay: a manifesto for multi-agent intelligence analysis. Preprint at https://arXiv.org/abs/1903.00742 (2019).Sukhbaatar, S. et al. Intrinsic motivation and automated curricula through uneven self-play. sixth International Conference on Learning Representations 6 (2018).Leibo, J. Z. et al. Malthusian reinforcement studying. Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems 1099–1107 (2019).Baker, B. et al. Emergent device use from multi-agent autocurricula. eighth International Conference on Learning Representations 8 (2020).Balduzzi, D. et al. Open-ended studying in symmetric zero-sum video games. Proc. thirty sixth International Conference on Machine Learning 97, 434–443 (PMLR, 2019).Plappert, M. et al. Asymmetric self-play for automated purpose discovery in robotic manipulation. Preprint at https://arXiv.org/abs/2101.04882 (2021).Goodfellow, I. et al. Generative adversarial nets. Adv. NeurIPS 27, 2672–2680 (2014).Herrmann, E., Call, J., Hernández-Lloreda, M. V., Hare, B. & Tomasello, M. Humans have advanced specialised expertise of social cognition: the cultural intelligence speculation. Science 317, 1360–1366 (2007).Article

Google Scholar
Boyd, R., Richerson, P. J. & Henrich, J. The cultural area of interest: why social studying is important for human adaptation. Proc. Natl Acad. Sci. USA 108, 10918–10925 (2011).Article

Google Scholar
Whiten, A. Cultural evolution in animals. Annu. Rev. Ecol. Evol. Syst. 50, 27–48 (2019).Article

Google Scholar
Dunbar, R. I. M. The social mind speculation. Evol. Anthropol. 6, 178–190 (1998).Article

Google Scholar
Byrne, R. W. Machiavellian intelligence retrospective. J. Comp. Psychol. 132, 432 (2018).Article

Google Scholar
Szathmáry, E. & Maynard Smith, J. The main evolutionary transitions. Nature 374, 227–232 (1995).Article

Google Scholar
Jablonka, E. & Lamb, M. J. Evolution in Four Dimensions: Genetic, Epigenetic, Behavioral, and Symbolic Variation within the History of Life (MIT Press, 2014).Heyes, C. Cognitive Gadgets: The Cultural Evolution of Thinking (Harvard Univ. Press, 2018).Ng, W.-L. & Bassler, B. L. Bacterial quorum-sensing community architectures. Ann. Rev. Genet. 43, 197 (2009).Article

Google Scholar
Verheggen, F. J., Haubruge, E. & Mescher, M. C. Alarm pheromones—chemical signaling in response to hazard. Vit. Horm. 83, 215–239 (2010).Article

Google Scholar
Nagy, M. et al. Synergistic advantages of group search in rats. Curr. Biol. 30, 4733–4738 (2020).Article

Google Scholar
Schluter, D. The Ecology of Adaptive Radiation (Oxford Univ. Press, 2000).Bansal, T., Pachocki, J., Sidor, S., Sutskever, I. & Mordatch, I. Emergent complexity through multi-agent competitors. sixth International Conference on Learning Representations 6 (2018).Reynolds, C. W. Flocks, herds and faculties: a distributed behavioral mannequin. Computer Graphics 21, 25–34 (1987).Lerer, A. & Peysakhovich, A. Maintaining cooperation in complicated social dilemmas utilizing deep reinforcement studying. Preprint at https://arXiv.org/abs/1707.01068 (2017).Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J. & Graepel, T. Multi-agent reinforcement studying in sequential social dilemmas. Proc. sixteenth International Conference on Autonomous Agents and MultiAgent Systems 464–473 (2017).McKee, Ok. R., Leibo, J. Z., Beattie, C. & Everett, R. Quantifying the consequences of setting and inhabitants range in multi-agent reinforcement studying. Auton. Agents Multi-Agent Syst. 36, 21 (2022).Strouse, D., McKee, Ok., Botvinick, M., Hughes, E. & Everett, R. Collaborating with people with out human knowledge. Adv. NeurIPS 34, 14502–14515 (2021).
Google Scholar
Lazaridou, A., Peysakhovich, A. & Baroni, M. Multi-agent cooperation and the emergence of (pure) language. fifth International Conference on Learning Representations 5 (2017).Czarnecki, W. M. et al. Real world video games seem like spinning tops. Adv. NeurIPS 33, 17443–17454 (2020).
Google Scholar
McGill, B. J. & Brown, J. S. Evolutionary recreation principle and adaptive dynamics of steady traits. Annu. Rev. Ecol. Evol. Syst. 38, 403–435 (2007).Article

Google Scholar
Sareni, B. & Krahenbuhl, L. Fitness sharing and niching strategies revisited. IEEE Trans. Evol. Comp. 2, 97–106 (1998).Article

Google Scholar
Lehman, J. et al. The shocking creativity of digital evolution: a group of anecdotes from the evolutionary computation and artificial life analysis communities. Artif. Life 26, 274–306 (2020).Article

Google Scholar
Van Valen, L. A new evolutionary legislation. Evol. Theory 1, 1–30 (1973).
Google Scholar
Dawkins, R. & Krebs, J. R. Arms races between and inside species. Proc. R. Soc. B 205, 489–511 (1979).
Google Scholar
Sims, Ok. Evolving 3D morphology and conduct by competitors. Artif. Life 1, 353–372 (1994).Article

Google Scholar
Nolfi, S. & Floreano, D. Coevolving predator and prey robots: do ‘arms races’ come up in artificial evolution? Artif. Life 4, 311–335 (1998).Article

Google Scholar
Silver, D. et al. Mastering the sport of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).Article

Google Scholar
Stooke, A. et al. Open-ended studying leads to typically succesful brokers. Preprint at https://arXiv.org/abs/2107.12808 (2021).Johanson, M. B., Hughes, E., Timbers, F. & Leibo, J. Z. Emergent bartering behaviour in multi-agent reinforcement studying. Preprint at https://arXiv.org/abs/2205.06760 (2022).Clune, J. AI-GAs: AI-generating algorithms, an alternate paradigm for producing normal artificial intelligence. Preprint at https://arXiv.org/abs/1905.10985 (2019).Nisioti, E. & Moulin-Frier, C. Grounding artificial intelligence within the origins of human conduct. Preprint at https://arXiv.org/abs/2012.08564 (2020).Aubret, A., Matignon, L. & Hassas, S. A survey on intrinsic motivation in reinforcement studying. Preprint at https://arXiv.org/abs/1908.06976 (2019).Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 267–285 (1994).Jaderberg, M. et al. Human-level efficiency in 3D multiplayer video games with population-based reinforcement studying. Science 364, 859–865 (2019).Article
MathSciWeb

Google Scholar
Bakhtin, A. et al. Human-level play within the recreation of Diplomacy by combining language fashions with strategic reasoning. Science 378, 1067–1074 (2022).Article
MathSciWeb

Google Scholar
Byrne, R. & Whiten, A. Machiavellian Intelligence (Oxford Univ. Press, 1994).Lanctot, M. et al. A unified game-theoretic strategy to multiagent reinforcement studying. Adv. NeurIPS 30, 4190–4203 (2017).Vinyals, O. et al. Grandmaster degree in StarCraft II utilizing multi-agent reinforcement studying. Nature 575, 350–354 (2019).Article

Google Scholar
Rendell, L. et al. Why copy others? Insights from the social studying methods event. Science 328, 208–213 (2010).Article
MathSciWeb
MATH

Google Scholar
Fang, C., Lee, J. & Schilling, M. A. Balancing exploration and exploitation by way of structural design: the isolation of subgroups and organizational studying. Org. Sci. 21, 625–642 (2010).Article

Google Scholar
Lazer, D. & Friedman, A. The community construction of exploration and exploitation. Admin. Sci. Quart. 52, 667–694 (2007).Article

Google Scholar
Mason, W. A., Jones, A. & Goldstone, R. L. Propagation of improvements in networked teams. J. Exp. Psychol. Gen. 137, 422 (2008).Article

Google Scholar
Vlasceanu, M., Morais, M. J. & Coman, A. Network construction impacts the synchronization of collective beliefs. J. Cogn. Cult. 21, 431–448 (2021).Article

Google Scholar
Coman, A., Momennejad, I., Drach, R. D. & Geana, A. Mnemonic convergence in social networks: the emergent properties of cognition at a collective degree. Proc. Natl Acad. Sci. USA 113, 8171–8176 (2016).Article

Google Scholar
Centola, D. The community science of collective intelligence. Trends Cogn. Sci. 26, 923–941 (2022).Bernstein, E., Shore, J. & Lazer, D. How intermittent breaks in interplay enhance collective intelligence. Proc. Natl Acad. Sci. USA 115, 8734–8739 (2018).Article

Google Scholar
McKee, Ok. R. et al. Scaffolding cooperation in human teams with deep reinforcement studying. Nat. Hum. Behav. 7, 1787–1796 (2023).Osa, T. et al. An algorithmic perspective on imitation studying. Found. Trends Robot. 7, 1–179 (2018).Article

Google Scholar
Torabi, F., Warnell, G. & Stone, P. Behavioral cloning from remark. Proc. twenty seventh International Joint Conference on Artificial Intelligence 4950–4957 (2018).Ho, J. & Ermon, S. Generative adversarial imitation studying. Adv. NeurIPS 29, (2016).Liu, S. et al. From motor management to staff play in simulated humanoid soccer. Preprint at https://arXiv.org/abs/2105.12196 (2021).Borsa, D. et al. Observational studying by reinforcement studying. Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems 1117–1124 (2019).Ndousse, Ok. Ok., Eck, D., Levine, S. & Jaques, N. Emergent social studying through multi-agent reinforcement studying. Proc. thirty eighth International Conference on Machine Learning 139, 7991–8004 (PMLR, 2021).Nisioti, E., Mahaut, M., Oudeyer, P.-Y., Momennejad, I. & Moulin-Frier, C. Social community construction shapes innovation: experience-sharing in RL with SAPIENS. Preprint at https://arXiv.org/abs/2206.05060 (2022).Jablonka, E. & Lamb, M. J. The evolution of knowledge within the main transitions. J. Theor. Biol. 239, 236–246 (2006).Article
MathSciWeb

Google Scholar
Henrich, J. The Secret of Our Success: How Culture is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter (Princeton Univ. Press, 2016).Bowling, S., Lawlor, Ok. & Rodríguez, T. A. Cell competitors: the winners and losers of health choice. Development 146, dev167486 (2019).Article

Google Scholar
Raff, M. C. Social controls on cell survival and cell dying. Nature 356, 397–400 (1992).Article

Google Scholar
Ferrante, E., Turgut, A. E., Duéñez-Guzmán, E., Dorigo, M. & Wenseleers, T. Evolution of self-organized process specialization in robotic swarms. PLoS Comp. Biol. 11, e1004273 (2015).Article

Google Scholar
Peysakhovich, A. & Lerer, A. Prosocial studying brokers resolve generalized stag hunts higher than egocentric ones. Proc. seventeenth International Conference on Autonomous Agents and MultiAgent Systems 2043–2044 (2018).Brambilla, M., Ferrante, E., Birattari, M. & Dorigo, M. Swarm robotics: a overview from the swarm engineering perspective. Swarm Intell. 7, 1–41 (2013).Article

Google Scholar
Oroojlooy, A. & Hajinezhad, D. A overview of cooperative multi-agent deep reinforcement studying. Appl. Intell. 53, 13677–13722 (2023).Schranz, M., Umlauft, M., Sende, M. & Elmenreich, W. Swarm robotic behaviors and present functions. Front. Robot. AI 7, 36 (2020).Article

Google Scholar
Leibo, J. Z. et al. Scalable analysis of multi-agent reinforcement studying with Melting Pot. Proc. thirty eighth International Conference on Machine Learning 139, 6187–6199 (PMLR, 2021).Sunehag, P., Vezhnevets, A. S., Duéñez-Guzmán, E., Mordach, I. & Leibo, J. Z. Diversity by way of exclusion (DTE): area of interest identification for reinforcement studying by way of value-decomposition. Proc. 2023 International Conference on Autonomous Agents and Multiagent Systems 2827–2829 (2023).Wang, J. X. et al. Evolving intrinsic motivations for altruistic conduct. Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems 683–692 (2019).Gemp, I. et al. D3C: lowering the value of anarchy in multi-agent studying. Proc. twenty first International Conference on Autonomous Agents and Multiagent Systems 498–506 (2022).Zheng, S., Trott, A., Srinivasa, S., Parkes, D. C. & Socher, R. The AI economist: taxation coverage design through two-level deep multiagent reinforcement studying. Sci. Adv. 8, eabk2607 (2022).Article

Google Scholar
Koster, R. et al. Human-centered mechanism design with democratic AI. Nat. Hum. Behav. 6, 1398–1407 (2022).Article

Google Scholar
Dean, L. G., Kendal, R. L., Schapiro, S. J., Thierry, B. & Laland, Ok. N. Identification of the social and cognitive processes underlying human cumulative tradition. Science 335, 1114–1118 (2012).Article

Google Scholar
Muthukrishna, M. & Henrich, J. Innovation within the collective mind. Phil. Trans. R. Soc. B 371, 20150192 (2016).Article

Google Scholar
Dunbar, R. I. & Shultz, S. Why are there so many explanations for primate mind evolution? Phil. Trans. R. Soc. B 372, 20160244 (2017).Article

Google Scholar
Kirby, S., Tamariz, M., Cornish, H. & Smith, Ok. Compression and communication within the cultural evolution of linguistic construction. Cognition 141, 87–102 (2015).Article

Google Scholar
Ostrom, E. Understanding Institutional Diversity (Princeton Univ. Press, 2005).Havrylov, S. & Titov, I. Emergence of language with multi-agent video games: Learning to talk with sequences of symbols. Adv. NeurIPS 30, (2017).Mordatch, I. & Abbeel, P. Emergence of grounded compositional language in multi-agent populations. Proc. AAAI Conf. Artif. Intell. 32, https://doi.org/10.1609/aaai.v32i1.11492 (2018).Brown, T. et al. Language fashions are few-shot learners. Adv. NeurIPS 33, 1877–1901 (2020).
Google Scholar
Chowdhery, A. et al. PaLM: scaling language modeling with pathways. Preprint at https://arXiv.org/abs/2204.02311 (2022).Chan, S. C. et al. Data distributional properties drive emergent few-shot studying in transformers. Adv. NeurIPS 35, 18878–18891 (2022).
Google Scholar
Wei, J. et al. Chain of thought prompting elicits reasoning in giant language fashions. Adv. NeurIPS 35, 24824–24837 (2022).
Google Scholar
Bisk, Y. et al. Experience grounds language. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing 8718–8735 (2020).Ullman, T. Large language fashions fail on trivial alterations to theory-of-mind duties. Preprint at https://arXiv.org/abs/2302.08399 (2023).Liu, R. et al. Mind’s eye: Grounded language mannequin reasoning by way of simulation. eleventh International Conference on Learning Representations 11 (2023).Glaese, A. et al. Improving alignment of dialogue brokers through focused human judgements. Preprint at https://arXiv.org/abs/2209.14375 (2022).Colas, C., Karch, T., Moulin-Frier, C. & Oudeyer, P.-Y. Language and tradition internalization for human-like autotelic AI. Nat. Mach. Intell. 4, 1068–1076 (2022).Article

Google Scholar
Villalobos, P. et al. Will we run out of information? An evaluation of the bounds of scaling datasets in machine studying. Preprint at https://arXiv.org/abs/2211.04325 (2022).Gazda, S. Ok. Driver-barrier feeding conduct in bottlenose dolphins (Tursiops truncatus): new insights from a longitudinal research. Mar. Mammal Sci. 32, 1152–1160 (2016).Article

Google Scholar
Bales, Ok. L. et al. What is a pair bond? Horm. Behav. 136, 105062 (2021).Article

Google Scholar
Lukas, D. & Clutton-Brock, T. Social complexity and kinship in animal societies. Ecol. Lett. 21, 1129–1134 (2018).Article

Google Scholar
Feldman, R. The adaptive human parental mind: implications for youngsters’s social growth. Trends Neurosci. 38, 387–399 (2015).Article

Google Scholar
Tarr, B., Launay, J., Cohen, E. & Dunbar, R. Synchrony and exertion throughout dance independently increase ache threshold and encourage social bonding. Biol. Lett. 11, 20150767 (2015).Article

Google Scholar
Lieberwirth, C. & Wang, Z. Social bonding: regulation by neuropeptides. Front. Neurosci. 8, 171 (2014).Article

Google Scholar
Ågren, J. A., Davies, N. G. & Foster, Ok. R. Enforcement is central to the evolution of cooperation. Nat. Ecol. Evol. 3, 1018–1029 (2019).Article

Google Scholar
Wilkins, A. S., Wrangham, R. W. & Fitch, W. T. The ‘domestication syndrome’ in mammals: a unified clarification based mostly on neural crest cell conduct and genetics. Genetics 197, 795–808 (2014).Article

Google Scholar

https://www.nature.com/articles/s42256-023-00754-x