Yet it is hard, for several reasons, to fathom what LLMs "think".
但由于一些原因,很难弄清楚大型语言模型在"想什么"。
Details of the programming and training data of commercial ones like ChatGPT are proprietary.
像ChatGPT这种商业语言模型的编程和训练数据的细节为其公司所有。
And not even the programmers know exactly what is going on inside.
连程序员都不知道模型内部到底发生了什么。
Linguists have, however, found clever ways to test LLMs' underlying knowledge, in effect tricking them with probing tests.
然而,语言学家已经找到了测试大型语言模型的潜在知识的巧妙办法,实际上就是用试探性的测试来欺骗它们。
And indeed, LLMs seem to learn nested, hierarchical grammatical structures, even though they are exposed to only linear input, ie, strings of text.
事实上,虽然大型语言模型只接触了线性的输入内容,即文本字符串,但它们似乎学会了嵌套的层级语法结构。
They can handle novel words and grasp parts of speech.
它们能够处理新造词并掌握词性。
Tell ChatGPT that "dax" is a verb meaning to eat a slice of pizza by folding it, and the system deploys it easily: "After a long day at work, I like to relax and dax on a slice of pizza while watching my favourite TV show."
告诉ChatGPT,dax是一个动词,意思是把披萨折叠起来吃,然后它就能轻松地运用这个单词:"结束了漫长的一天的工作之后,我喜欢放松一下,一边把披萨折叠起来吃,一边看我最喜欢的电视节目。
(The imitative element can be seen in "dax on", which ChatGPT probably patterned on the likes of "chew on" or "munch on".)
(模仿的元素可以从dax on的例子中看出,ChatGPT很可能是仿照了chew on或munch on等表示咀嚼之类的词汇。)
What about the "poverty of the stimulus"?
那么"语言刺激贫瘠"方面又是什么情况呢?
After all, GPT-3 (the LLM underlying ChatGPT until the recent release of GPT-4) is estimated to be trained on about 1,000 times the data a human ten-year-old is exposed to.
毕竟,GPT-3(在最近发布GPT-4之前,它是ChatGPT的底层语言模型)接受的训练数据大约是一个十岁人类儿童所接触的数据量的1000倍。
That leaves open the possibility that children have an inborn tendency to grammar, making them far more proficient than any LLM.
这就带来了一种可能性,那就是儿童天生就有语法倾向,这使他们比任何大型语言模型都更擅长语言。
In a forthcoming paper in Linguistic Inquiry, researchers claim to have trained an LLM on no more text than a human child is exposed to, finding that it can use even rare bits of grammar.
在即将发表在《语言研究》杂志上的一篇论文中,研究人员声称,用不超过人类儿童所接触的文本量的数据训练一个大型语言模型,发现它甚至可以使用很少见的语法。
But other researchers have tried to train an LLM on a database of only child-directed language (that is, of transcripts of carers speaking to children).
但其他研究人员试图用只面向儿童的语言数据库(即照顾孩子的人对儿童所说话语的文字稿)来训练大型语言模型。
Here LLMs fare far worse.
在这种情况下,大型语言模型的表现要糟糕得多。
Perhaps the brain really is built for language, as Professor Chomsky says.
也许正如乔姆斯基教授所说,大脑真的是为语言而生的。
It is difficult to judge.
结果很难判断。
Both sides of the argument are marshalling LLMs to make their case.
争论的双方都带领着大型语言模型来证明他们的观点。
The eponymous founder of his school of linguistics has offered only a brusque riposte.
创立了其同名语言学派的乔姆斯基对此只给出了简短的回应。
For his theories to survive this challenge, his camp will have to put up a stronger defence.
要想让他的理论经受住这次挑战,乔姆斯基阵营将必须给出更强有力的辩词。