Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 1553e62

Browse files
Merge pull request #212 from eclipse/ag_google_new_update
Update link to google news
2 parents 145a097 + 048e044 commit 1553e62

File tree

2 files changed

+10
-10
lines changed

2 files changed

+10
-10
lines changed

‎cn/word2vec.html

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
<p>让我们来看看Word2vec可以得出哪些其他的关联。</p>
5757
<p>我们不用加号、减号和等号,而是用逻辑类比符号表示结果,其中 <code>:</code> 代 表&ldquo;...与...的关系&rdquo;,而 <code>:: </code>代表&ldquo;相当于&rdquo;;比如&ldquo;罗马与意大利的关系相当于北京与中国的关系&rdquo; = <code>Rome:Italy::Beijing:China</code>。接下来我们不会直接提供&ldquo;答案&rdquo;,而是给出一个Word2vec模型在给定最初三个词后生成的词表:</p>
5858
<pre class="line-numbers"><code class="language-java">
59-
king:queen::man:[woman, Attempted abduction, teenager, girl]
59+
king:queen::man:[woman, Attempted abduction, teenager, girl]
6060
//有点奇怪,但能看出有些关联
6161

6262
China:Taiwan::Russia:[Ukraine, Moscow, Moldova, Armenia]
@@ -68,9 +68,9 @@
6868

6969
New York Times:Sulzberger::Fox:[Murdoch, Chernin, Bancroft, Ailes]
7070
//Sulzberger-Ochs家族是《纽约时报》所有人和管理者。
71-
//Murdoch家族持有新闻集团,而福克斯新闻频道为新闻集团所有。
71+
//Murdoch家族持有新闻集团,而福克斯新闻频道为新闻集团所有。
7272
//Peter Chernin曾连续13年担任新闻集团的首席运营官。
73-
//Roger Ailes是福克斯新闻频道的总裁。
73+
//Roger Ailes是福克斯新闻频道的总裁。
7474
//Bancroft家族将华尔街日报出售给新闻集团。
7575

7676
love:indifference::fear:[apathy, callousness, timidity, helplessness, inaction]
@@ -81,7 +81,7 @@
8181
//Word2vec认为特朗普也与共和党人这个概念对立。
8282

8383
monkey:human::dinosaur:[fossil, fossilized, Ice_Age_mammals, fossilization]
84-
//人类是变成化石的猴子?人类是
84+
//人类是变成化石的猴子?人类是
8585
//猴子遗留下来的东西?人类是打败了猴子的物种,
8686
//就像冰川世纪的哺乳动物打败了恐龙那样?好像有点道理。
8787

@@ -192,7 +192,7 @@
192192
System.out.println(lst);
193193
UiServer server = UiServer.getInstance();
194194
System.out.println("Started on port " + server.getPort());
195-
195+
196196
//输出:[night, week, year, game, season, during, office, until, -]
197197
</code></pre>
198198

@@ -249,7 +249,7 @@
249249
<p>如果词不属于已知的词汇,Word2vec会返回一串零。</p><br>
250250

251251
<p><h3>导入Word2vec模型</h3></p>
252-
<p>我们用来测试已定型网络准确度的<a href="https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz" target="_blank">谷歌新闻语料模型</a>由S3托管。如果用户当前的硬件定型大规模语料需要很长时间,可以下载这个模型,跳过前期准备直接探索Word2vec。</p>
252+
<p>我们用来测试已定型网络准确度的<a href="https://github.com/mmihaltz/word2vec-GoogleNews-vectors" target="_blank">谷歌新闻语料模型</a>由S3托管。如果用户当前的硬件定型大规模语料需要很长时间,可以下载这个模型,跳过前期准备直接探索Word2vec。</p>
253253
<p>如果你是使用<a href="https://docs.google.com/file/d/0B7XkCwpI5KDYaDBDQm1tZGNDRHc/edit">C向量</a>或Gensimm定型的,那么可以用下面这行代码导入模型。</p>
254254
<pre class="line-numbers"><code class="language-java">
255255
File gModel = new File("/Developer/Vector Models/GoogleNews-vectors-negative300.bin.gz");
@@ -259,7 +259,7 @@
259259
<p>较大的模型可能会遇到堆空间的问题。谷歌模型可能会占据多达10G的RAM,而JVM只能以256MB的RAM启动,所以必须调整你的堆空间。方法可以是使用一个<code>bash_profile</code>文件(参见<a href="hgettingstarted.html#trouble">疑难解答</a>),或通过IntelliJ本身来解决:</p>
260260
<pre class="line-numbers"><code class="language-java">
261261
//点击:
262-
IntelliJ Preferences > Compiler > Command Line Options
262+
IntelliJ Preferences > Compiler > Command Line Options
263263
//然后粘贴:
264264
-Xms1024m
265265
-Xmx10g
@@ -291,9 +291,9 @@
291291
</code></pre>
292292
<p><strong>答:</strong>检查Word2vec应用的启动目录内部。这可能是一个IntelliJ项目的主目录,或者你在命令行中键入了Java的那个目录。其中应当有这样一些目录:</p>
293293
<pre class="line-numbers"><code class="language-java">
294-
ehcache_auto_created2810726831714447871diskstore
294+
ehcache_auto_created2810726831714447871diskstore
295295
ehcache_auto_created4727787669919058795diskstore
296-
ehcache_auto_created3883187579728988119diskstore
296+
ehcache_auto_created3883187579728988119diskstore
297297
ehcache_auto_created9101229611634051478diskstore
298298
</code></pre>
299299
<p>你可以关闭Word2vec应用并尝试删除这些目录。</p><br>

‎docs/_100-beta2/deeplearning4j-nlp-word2vec.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -332,7 +332,7 @@ If the word isn't in the vocabulary, Word2vec returns zeros.
332332

333333
### <a name="import">Importing Word2vec Models</a>
334334

335-
The [Google News Corpus model](https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz) we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.
335+
The [Google News Corpus model](https://github.com/mmihaltz/word2vec-GoogleNews-vectors) we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.
336336

337337
If you trained with the [C vectors](https://docs.google.com/file/d/0B7XkCwpI5KDYaDBDQm1tZGNDRHc/edit) or Gensimm, this line will import the model.
338338

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /