The Meta team's crucial contribution was therefore to augment reinforcement learning with natural-language processing.
因此,该团队做出的关键贡献是利用自然语言处理来增强强化学习。
Large language models, trained on vast amounts of data to predict deleted words, have an uncanny ability to mimic the patterns of real language and say things that humans might.
以海量数据为基础进行训练,预测删除词的大语言模型具有一种不可思议的能力,可以模仿人类语言的模式,说出人类可能说的话。
For Cicero, the team started with a pre-trained model with a baseline understanding of language, and fine-tuned this on dialogues from more than 40,000 past games, to teach it Diplomacy-specific patterns of speech.
就Cicero来说,团队从对语言有基本理解能力的预训练模型开始,利用过去超过4万场《强权外交》游戏比赛的对话对其进行扩充,以教授它该游戏特有的言语模式。
To play the game, Cicero looks at the board, remembers past moves and makes an educated guess as to what everyone else will want to do next.
在这个游戏中,Cicero需要看着棋盘,记住自己此前的行动,并对其他人的下一步行动做出合理推测。
Then it tries to work out what makes sense for its own move, by choosing different goals, simulating what might happen, and also simulating how all the other players will react to that.
然后,它尝试选择不同的目标,模拟可能发生的情况,以及所有其他参与者对此做出的反应,来判断行动是否合理。
Once it has come up with a move, it must work out what words to say to the others.
一旦决定了如何行动,它要认真思考如何与其他人对话。
To that end, the language model spits out possible messages, throws away the bad ideas and anything that is actual gobbledygook, and chooses the ones, appropriate to the recipients concerned, that its experience and algorithms suggest will most persuasively further its agenda.
为此,语言模型会给出可能的信息,丢弃糟糕的想法和官话,选择最能赢得对手青睐的想法,这些想法是模型根据其经验和算法得出的最有说服力,且最有可能推进其议程的。
Cicero, then, can negotiate, convince, cooperate and compete.
然后,Cicero可以与对手进行谈判、游说、合作和竞争。
Seasoned Diplomacy players will, though, want to know something else: has it learned how to stab?
然而,经验丰富的《强权外交》玩家还想知道一件事:它学会如何背刺了吗?
Stabbing - saying one thing and doing another (especially, attacking a current ally) is seen by many as Diplomacy's defining feature.
背刺——说一套做一套(尤其是攻击当前的盟友)被许多人视为《强权外交》游戏中的定义性特征。
But, though Cicero did, “strategically withhold information from players in gameplay”, it did not actually stab any of its opponents.
尽管Cicero确实“在战略上对其它玩家隐瞒了信息”,但它实际上没有暗中伤害任何对手。
Perhaps it was this final lack of Machiavellian ruthlessness which explains why it was only in the top 10%, and not victor ludorum.
也许正是由于缺失这种马基雅维利主义的冷酷无情,解释了为什么Cicero只进入了排名前10%,而没有最终成为“总冠军”。