A scientist put a head cam on his daughter to capture the human experience of acquiring words.

When Luna was seven months old, she began wearing, at the behest of her scientist father, a hot-pink helmet topped with a camera that would, for about an hour at a time, capture everything she saw, heard, and said.

Her dad, Brenden Lake, is a cognitive scientist at New York University, where he thinks about better ways to train artificial intelligence. At home, he trains human intelligence, by which I just mean that he’s a dad. On a recent Sunday morning, he held up a robot puppet and asked Luna, who was meting out her wooden toys, “That’s for robot?” “Oh, goodness!” he added in a silly Muppet voice. Luna seemed only half-interested—in the way small children are always sort of on their own planet—but a couple of minutes later, she returned to pick up the puppet. “Robot,” she said. “Robot,” she repeated, dispelling any doubt about her intentions. Her dad turned to me, surprised; he’d never heard her say “robot” before. Had she learned the word just now?

At one and a half years old, Luna has mastered a technique that current AI models still struggle with. Humans are able to learn from very few examples, meaning that even a single encounter can solidify the connection between a silver hand puppet and the phonemes that comprise robot. Artificial intelligence, by contrast, might need dozens or hundreds of examples; large language models such as the one powering ChatGPT are trained on hundreds of billions, if not trillions, of words—an inhuman amount of data. “It would take 1,000 years to hear a word count of that magnitude,” Lake told me. Given that humans require far less time—and far fewer words—to master language, could AI be trained more efficiently? Could it learn more like, say, a toddler?

These questions are what initially motivated Lake to record his daughter’s early life. (He convinced his wife with a more sentimental pitch: They could capture and replay Luna’s baby milestones.) Along with 25 or so other babies, Luna is part of the BabyView study, a project run out of Stanford that aims to capture exactly what young kids see and hear in the crucial period when they’re picking up language at a shocking speed. Lake hopes to one day feed the data from Luna and others back into his own models—to find better ways of training AI, and to find better ways of understanding how children pull off the ubiquitous yet remarkable feat of learning language.

Recent technological leaps—in artificial intelligence but also in hardware—have given scientists new tools to study developmental psychology. Cameras and microphones are now small and light enough for infants to wear for longer stretches, including at home. In the early 2010s, Michael Frank, a developmental psychologist at Stanford who now leads the BabyView study, decided along with two colleagues to put head cams on their own babies. They would track their kid’s development from about six months, when babies have enough neck strength not to be bothered by a camera, to around two and a half years, when toddlers really start to protest. Frank’s baby, however, refused to consent from the start; she absolutely loathed having anything on her head. “I didn’t have the fortitude” to continue, he told me, and his daughter dropped out. But the data collected from the two other babies—and later a third—were released in 2021 as a research data set called SAYCam.

Not long after, Frank decided to go bigger and more ambitious with BabyView, which has the same idea but would feature more babies, crisper audio, and higher-resolution video. This resulting data will be shared online, but to protect the privacy of the babies, it’ll be accessible only to institutional researchers, and participants can choose to delete videos well before they are shared.

Lake decided to sign his daughter up for BabyView—fortunately, Luna tolerates a head cam just fine—because he was immediately interested in using the SAYCam corpus to train AI. On a basic level, would it even work? His group at NYU published a much-publicized paper in Science this past winter, which showed that even AI models trained on 61 hours of low-res video, or just 1 percent of the waking hours of one SAYCam baby, could classify images that showed objects including a ball, a cat, and a car. A suite of other studies from his lab has found that AI models trained on SAYCam can form their own categories such as “food,” “vehicle,” and “clothing,” or clusters of words that correspond to nouns or verbs—as you might expect a young toddler to do as they learn about the world.

To be clear, Lake and his colleagues do not claim to have replicated in silico how toddlers actually learn. The models are trained, after all, on snippets of video and text—a poor imitation of the rich sensory experience of being in a physical world. But the studies are most interesting as proof of concept. In the field of language acquisition, for example, experts have long debated the extent to which babies are born with innate knowledge, strategies, and biases that prime them for language. On one extreme, one could posit that babies are born as blank slates. The AI models definitely started as blank slates; if training them with just a small percentage of a baby’s audiovisual experience can get them to classify balls and cats, that shows how a neural network can learn “starting from nothing,” says Wai Keen Vong, a research scientist with Lake at NYU who was the lead author on the paper. By adult-human standards, though, the model might not be that impressive; its overall accuracy was just over 60 percent. Maybe it needs more data, or maybe it needs a different way of learning.

This is so where things could get interesting. Lake would like to equip artificial intelligence with some of the strategies babies seem to display in lab experiments. For example, when young children are presented with a new word—such as kettle—they seem to instinctively know that kettle refers to the entirety of the kettle, not just to its handle or its material or its color. When they are presented with two objects—one familiar and one unfamiliar—they will assume that a new word they hear refers to the new object. These strategies likely help babies sift through the cluttered, chaotic world of their everyday life, and they might help artificial intelligence learn more like a child too, though AI is far, far from actually imitating child.

That said, AI models could also inspire new ideas about how children learn. Chen Yu, a developmental psychologist at the University of Austin, told me about a study he conducted with his collaborators, in which parents and children wore head cams as they played with toys in a lab. Curiously, Yu and his collaborators noticed that a computer vision model trained on the child’s POV outperformed one trained on the parents’. What about a child’s perspective is more conducive to learning? They wondered if children were manipulating the toys more thoroughly, turning them back and forth to see the objects from different angles. With these AI-enabled approaches, Yu said, can generate new hypotheses that can then be tested back in the lab. Linda Smith, a frequent collaborator of Yu’s and a longtime researcher of children’s cognitive development at Indiana University, told me that when she got her start, decades ago, “artificial intelligence and human cognition were one field. It was all the same people.” The fields may have since diverged, but the overlap still makes perfect sense.

In his academic career, Lake, who had previously taught an AI model how handwriting works, has also been seeking out ways to create an AI that learns more like a human. This naturally led him to how children learn. “Children are the most impressive learners in the known universe,” he told me. After having kids of his own, he thought parenting might inspire fresh insights for his research. Has it? I probed, curious because I too have a 1-year-old at home, whose intellectual progression is possibly the most remarkable thing I have ever witnessed. Not really, he admitted. Watching children learn is so fascinating, so surprising, so fun. But the process is also so intuitive—if it was that easy for any parent to understand how their child learns, wouldn’t we have figured it out already?

QOSHE - What Toddlers and AI Can Learn From Each Other - Sarah Zhang

account_circle info brightness_medium cancel view_agenda grid_view

expand_moreexpand_less

Bosnia & Herzegovina

World

favourites

archive

Columnists

Actual . Favourites . Archive

We use cookies to provide some features and experiences in QOSHE

More information . Close

Aa Aa Aa

- A +

What Toddlers and AI Can Learn From Each Other

Sarah Zhang

The Atlantic

6

0
05.04.2024

A scientist put a head cam on his daughter to capture the human experience of acquiring words.

When Luna was seven months old, she began wearing, at the behest of her scientist father, a hot-pink helmet topped with a camera that would, for about an hour at a time, capture everything she saw, heard, and said.

Her dad, Brenden Lake, is a cognitive scientist at New York University, where he thinks about better ways to train artificial intelligence. At home, he trains human intelligence, by which I just mean that he’s a dad. On a recent Sunday morning, he held up a robot puppet and asked Luna, who was meting out her wooden toys, “That’s for robot?” “Oh, goodness!” he added in a silly Muppet voice. Luna seemed only half-interested—in the way small children are always sort of on their own planet—but a couple of minutes later, she returned to pick up the puppet. “Robot,” she said. “Robot,” she repeated, dispelling any doubt about her intentions. Her dad turned to me, surprised; he’d never heard her say “robot” before. Had she learned the word just now?

At one and a half years old, Luna has mastered a technique that current AI models still struggle with. Humans are able to learn from very few examples, meaning that even a single encounter can solidify the connection between a silver hand puppet and the phonemes that comprise robot. Artificial intelligence, by contrast, might need dozens or hundreds of examples; large language models such as the one powering ChatGPT are trained on hundreds of billions, if not trillions, of words—an inhuman amount of data. “It would take 1,000 years to hear a word count of that magnitude,” Lake told me. Given that humans require far less time—and far fewer words—to master language, could AI be trained more efficiently? Could it learn more like, say, a toddler?

These questions are what initially motivated Lake to record his daughter’s early life. (He convinced his wife with a more sentimental pitch: They could capture and replay Luna’s baby milestones.) Along with 25 or so other babies, Luna is part of the BabyView........

© The Atlantic