So there’s a methodology called k-Nearest Neighbor in big data analysis
大数据分析中有一种方法论叫做k最近邻算法
where you can find a person who looks similar to another person.
可以让你找到和一个人长得很像的另一人
Who’s the most similar on a number of traits?
在某些特性上 哪些人最相似?
But I kind of renamed the search a doppelganger search
不过我把这种搜索重命名为二重身搜索
because I think that’s a cooler name for it and also accurate.
因为这个说法更酷 也很准确
So you basically look in a huge data set, you take a person and say
你只需要在庞大的数据库中选择一个人 然后问
“Who is the person who looks most similar to that person?”
“哪个人跟这个人最像?”
So one way you might use this is if Amazon’s looking for what books to recommend.
亚马逊在向消费者推荐书籍的时候可能会用到这种方法
They may find your book-reading doppelganger.
他们可能会找到你的读书二重身
So across the whole universe of Amazon customers,
通过亚马逊消费者的巨大网络可以得知
who’s the person who tends to buy books like you have bought?
谁有可能买你已经买过的书?
And then what books has that person recently read and enjoyed that you haven’t read and enjoyed?
有哪些书是那个人最近读过并且喜欢 但你还没读过的?
And that’s sort of how they recommend books to you.
他们会通过这种方式给你推荐书籍
And this can be used in a lot of other areas.
很多其他领域也可以利用这一点
People are just starting to use this in health where you can say,
人们开始在健康领域利用这一点
across the entire universe of patients who has symptoms very similar to your symptoms,
在所有病人中 谁的症状与你极为相似
and what has worked for those people, are your health doppelgangers.
他们吃了什么药特别管用 这些人就是你的健康二重身
So it’s a very powerful methodology and it gets more powerful the more data you have.
所以这种方法论很强大 而且数据库越大 它就越强大
Because the more data you have the more similar,
因为你拥有的数据越多 就会越像
the more likely you’re going to find someone in that data set who’s really, really similar to you.
你就越有可能在数据库中找到跟你非常像的人
Some of this stuff, some of the big data analysis are things we have always kind of done.
有些大数据分析是我们经常做的事
That’s kind of what doctors try to do.
医生好像一直都想这么做
They try to say, “Who are you similar to?
他们想说“你跟谁比较相似?
Of all the patients I’ve seen, which ones remind me of your case, and what worked for them?”
在我见过的所有病人中 哪些人会让我想起你的病例 什么治疗对他们有效?”
But they’ve been doing this on a small number of patients, namely the ones they’ve seen.
但他们只会在一小部分病人身上这么做 也就是他们见过的那些病人
Whereas the potential for big data is you can do it over the entire universe of patients
而大数据的潜力就在于 你可以对全球所有的病人这么做
and get people who are, really, much, much more similar to you.
找到那个跟你最最相似的人
Really zoom in on the tiny subset of people who have a very similar path to you.
真正聚焦到跟你有着极其相似的经历的那一小波人身上
Instead of saying “You have the condition depression”
我们不会再说“你有抑郁症”
which might remind a doctor of a hundred depressed patients that he’s seen over the past couple of years,
这只会让医生想起他这些年来见过的上百名抑郁症患者
you can say maybe that “You have a particular type of depression.”
而要说“你患有某种具体类型的抑郁症”
So you maybe sleep all the time whereas other depressed patients don’t sleep all the time,
你可能一直嗜睡 但其他抑郁症患者却总失眠
and you feel guilty whereas other depressed patients don’t feel guilty,
你会感到愧疚 但其他抑郁症患者不会愧疚
and then really find these people who are really, really similar
从而找到那些真正特别相似的人
who’s depression has taken a much more similar path to yours than have other people’s depressions.
他们的抑郁症跟你非常像 而不是普普通通的一名抑郁症患者