TED演讲:你的用词透露了你未来的精神状态?(2)
时间:2018-09-30 03:12:58
搜索关注在线英语听力室公众号:tingroom,领取免费英语资料大礼包。
(单词翻译)
And the problem of how to go about this is quite obvious. 至于要如何做的问题,也是相当简单明了。
It's not like Plato woke up one day and then he wrote, 但我的意思并非,比如,柏拉图有一天突然醒来说,
"Hello, I'm Plato, and as of today, I have a
fully1 introspective consciousness." “哈啰!我是柏拉图,我今天,拥有完整的自省意识了” 那样的简单而已。
And this tells us actually what is the essence of the problem. 而这告诉我们,我们要找出,问题的本质为何。
We need to find the
emergence2 of a concept that's never said. 我们必须找到从来没有被谈论过的概念。
The word introspection does not appear a single time in the books we want to
analyze3. 在这些书本中从未出现过一次“自省”这个字,
So our way to solve this is to build the space of words. 所以为了解决这个问题,我们要建立一个文字的空间。
This is a huge space that contains all words 在这个大空间里,包含了相当多的字,
in such a way that the distance between any two of them is indicative of how closely related they are. 用这种方式可以量测出两个字彼此之间的关联性程度。
So for instance, you want the words "dog" and "cat" to be very close together, 举个例子,你会想,“狗”、“猫” 应该是比较有关联性的,
but the words "grapefruit" and "logarithm" to be very far away. 但“葡萄柚”和“对数” 就没甚么关联了。
And this has to be true for any two words within the space. 而在这个空间里的任何两个字,都必须是可以被量测出来的。
And there are different ways that we can construct the space of words. 而我们有很多方式可以建立起这些字的空间架构。
One is just asking the experts, a bit like we do with dictionaries. 方法一是只要请教专家就行了,有点类似查字典。
Another possibility is following the simple assumption that when two words are related, 另一个可行的方法是,当两个字出现关联性时,去追踪它们的预设状况,
they tend to appear in the same sentences, in the same paragraphs, in the same documents, 它们可能会出现在同一句、同一段落、或同一文件中,
more often than would be expected just by pure chance. 多于“偶然”地出现。
And this simple hypothesis, this simple method, 在这个简单的前提下,
with some computational tricks that have to do with the fact 这个单纯且带有运算技巧的方法必须好用,
that this is a very complex and high-dimensional space, turns out to be quite effective. 而这个复杂且高维度的空间,事后证明,相当有效。
And just to give you a flavor of how well this works, 向各位介绍一下它多有效,
this is the result we get when we analyze this for some familiar words. 我们分析了一些经常用到的字,
And you can see first that words automatically organize into semantic neighborhoods. 首先你可以看到,这些词汇会自动地归纳成语义相近的相邻群组,
So you get the fruits, the body parts, the computer parts, the scientific terms and so on. 所以你可看到,水果跟身体部位,计算机与科学字汇等等。
The algorithm also identifies that we organize concepts in a
hierarchy4. 算法也可以把我们要整理的概念分门别类出来。
So for instance, you can see that 举个例子,你可以看到,
the scientific terms break down into two subcategories of the
astronomic5 and the physics terms. 科学的字汇被拆解成两个子类,分别是太空与物理的词汇。
And then there are very fine things. 然后你会发现一件好玩的事,
For instance, the word astronomy, which seems a bit bizarre where it is, 举个例子,“天文学”这个词汇,它应该摆的位置
is actually exactly where it should be, 与它现在的位置好像不太搭嘎,
between what it is, an actual science, and between what it describes, the
astronomical6 terms. 它现在介于真实科学与天文学之间,偏向科学的位置,而它自己却是一个天文学的字汇。
And we could go on and on with this. 我们可以持续寻找其它类似的情况。
Actually, if you stare at this for a while, and you just build
random7 trajectories8 实际上,如果你盯着这些字一阵子,然后随机搭配连结一下这些字,
you will see that it actually feels a bit like doing poetry. 你会觉得好像自己在吟诗。
And this is because, in a way, walking in this space is like walking in the mind. 那是因为在某种程度上,在这些空间字汇里漫游就像是在脑海中吟诗一样。
And the last thing is that this algorithm also identifies what are our intuitions, 最后,算法也能辨识出人类的直觉字汇,
of which words should lead in the neighborhood of introspection. 并归纳到内省的相邻字汇中。
So for instance, words such as "self," "guilt," "reason," "emotion," are very close to "introspection," 举个例子,像是自我、内疚、理由、情绪与内省相关的字汇非常接近,
but other words, such as "red," "football," "candle," "banana," are just very far away. 但其它的字,像是红色、足球、蜡烛、香蕉就差很远了。
And so once we've built the space, the question of the history of introspection, 所以一旦我们建立起这样的词汇空间,
or of the history of any concept which before could seem abstract and somehow vague, 有关于内省的历史,有关与任何概念的历史,以前被认为是抽象或是有点模糊的字汇,
All that we have to do is take the books, we digitize them, 而我们要做的就是,拿起这些书把它们数字化,
and we take this stream of words as a
trajectory11 and project them into the space, 然后把这些字像子弹一样射到这些字汇空间里面,
and then we ask whether this trajectory spends significant time circling closely to the concept of introspection. 然后我们问计算机这些字汇所行经的轨迹花了多少的时间才达到内省概念的字汇中。
分享到: