Commit 1553e62

authored

Merge pull request #212 from eclipse/ag_google_new_update

Update link to google news

2 parents 145a097 + 048e044 commit 1553e62Copy full SHA for 1553e62

File tree

2 files changed

+10

-10

lines changed

cn
- word2vec.html
docs/_100-beta2
- deeplearning4j-nlp-word2vec.md

2 files changed

+10

-10

lines changed

`‎cn/word2vec.html`

Lines changed: 9 additions & 9 deletions

Original file line number	Diff line number	Diff line change
`@@ -56,7 +56,7 @@`
`56`	`56`	`<p>让我们来看看Word2vec可以得出哪些其他的关联。</p>`
`57`	`57`	`<p>我们不用加号、减号和等号,而是用逻辑类比符号表示结果,其中 <code>:</code> 代表“...与...的关系”,而 <code>:: </code>代表“相当于”;比如“罗马与意大利的关系相当于北京与中国的关系” = <code>Rome:Italy::Beijing:China</code>。接下来我们不会直接提供“答案”,而是给出一个Word2vec模型在给定最初三个词后生成的词表:</p>`
`58`	`58`	`<pre class="line-numbers"><code class="language-java">`
`59`		`-king:queen::man:[woman, Attempted abduction, teenager, girl]`
	`59`	`+king:queen::man:[woman, Attempted abduction, teenager, girl]`
`60`	`60`	`//有点奇怪,但能看出有些关联`
`61`	`61`
`62`	`62`	`China:Taiwan::Russia:[Ukraine, Moscow, Moldova, Armenia]`
`@@ -68,9 +68,9 @@`
`68`	`68`
`69`	`69`	`New York Times:Sulzberger::Fox:[Murdoch, Chernin, Bancroft, Ailes]`
`70`	`70`	`//Sulzberger-Ochs家族是《纽约时报》所有人和管理者。`
`71`		`-//Murdoch家族持有新闻集团,而福克斯新闻频道为新闻集团所有。`
	`71`	`+//Murdoch家族持有新闻集团,而福克斯新闻频道为新闻集团所有。`
`72`	`72`	`//Peter Chernin曾连续13年担任新闻集团的首席运营官。`
`73`		`-//Roger Ailes是福克斯新闻频道的总裁。`
	`73`	`+//Roger Ailes是福克斯新闻频道的总裁。`
`74`	`74`	`//Bancroft家族将华尔街日报出售给新闻集团。`
`75`	`75`
`76`	`76`	`love:indifference::fear:[apathy, callousness, timidity, helplessness, inaction]`
`@@ -81,7 +81,7 @@`
`81`	`81`	`//Word2vec认为特朗普也与共和党人这个概念对立。`
`82`	`82`
`83`	`83`	`monkey:human::dinosaur:[fossil, fossilized, Ice_Age_mammals, fossilization]`
`84`		`-//人类是变成化石的猴子?人类是`
	`84`	`+//人类是变成化石的猴子?人类是`
`85`	`85`	`//猴子遗留下来的东西?人类是打败了猴子的物种,`
`86`	`86`	`//就像冰川世纪的哺乳动物打败了恐龙那样?好像有点道理。`
`87`	`87`
`@@ -192,7 +192,7 @@`
`192`	`192`	`System.out.println(lst);`
`193`	`193`	`UiServer server = UiServer.getInstance();`
`194`	`194`	`System.out.println("Started on port " + server.getPort());`
`195`		`-`
	`195`	`+`
`196`	`196`	`//输出:[night, week, year, game, season, during, office, until, -]`
`197`	`197`	`</code></pre>`
`198`	`198`
`@@ -249,7 +249,7 @@`
`249`	`249`	`<p>如果词不属于已知的词汇,Word2vec会返回一串零。</p><br>`
`250`	`250`
`251`	`251`	`<p><h3>导入Word2vec模型</h3></p>`
`252`		`-<p>我们用来测试已定型网络准确度的<a href="https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz" target="_blank">谷歌新闻语料模型</a>由S3托管。如果用户当前的硬件定型大规模语料需要很长时间,可以下载这个模型,跳过前期准备直接探索Word2vec。</p>`
	`252`	`+<p>我们用来测试已定型网络准确度的<a href="https://github.com/mmihaltz/word2vec-GoogleNews-vectors" target="_blank">谷歌新闻语料模型</a>由S3托管。如果用户当前的硬件定型大规模语料需要很长时间,可以下载这个模型,跳过前期准备直接探索Word2vec。</p>`
`253`	`253`	`<p>如果你是使用<a href="https://docs.google.com/file/d/0B7XkCwpI5KDYaDBDQm1tZGNDRHc/edit">C向量</a>或Gensimm定型的,那么可以用下面这行代码导入模型。</p>`
`254`	`254`	`<pre class="line-numbers"><code class="language-java">`
`255`	`255`	`File gModel = new File("/Developer/Vector Models/GoogleNews-vectors-negative300.bin.gz");`
`@@ -259,7 +259,7 @@`
`259`	`259`	`<p>较大的模型可能会遇到堆空间的问题。谷歌模型可能会占据多达10G的RAM,而JVM只能以256MB的RAM启动,所以必须调整你的堆空间。方法可以是使用一个<code>bash_profile</code>文件(参见<a href="hgettingstarted.html#trouble">疑难解答</a>),或通过IntelliJ本身来解决:</p>`
`260`	`260`	`<pre class="line-numbers"><code class="language-java">`
`261`	`261`	`//点击:`
`262`		`- IntelliJ Preferences > Compiler > Command Line Options`
	`262`	`+ IntelliJ Preferences > Compiler > Command Line Options`
`263`	`263`	`//然后粘贴:`
`264`	`264`	`-Xms1024m`
`265`	`265`	`-Xmx10g`
`@@ -291,9 +291,9 @@`
`291`	`291`	`</code></pre>`
`292`	`292`	`<p><strong>答:</strong>检查Word2vec应用的启动目录内部。这可能是一个IntelliJ项目的主目录,或者你在命令行中键入了Java的那个目录。其中应当有这样一些目录:</p>`
`293`	`293`	`<pre class="line-numbers"><code class="language-java">`
`294`		`-ehcache_auto_created2810726831714447871diskstore`
	`294`	`+ehcache_auto_created2810726831714447871diskstore`
`295`	`295`	`ehcache_auto_created4727787669919058795diskstore`
`296`		`- ehcache_auto_created3883187579728988119diskstore`
	`296`	`+ ehcache_auto_created3883187579728988119diskstore`
`297`	`297`	`ehcache_auto_created9101229611634051478diskstore`
`298`	`298`	`</code></pre>`
`299`	`299`	`<p>你可以关闭Word2vec应用并尝试删除这些目录。</p><br>`

`‎docs/_100-beta2/deeplearning4j-nlp-word2vec.md`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -332,7 +332,7 @@ If the word isn't in the vocabulary, Word2vec returns zeros.`
`332`	`332`
`333`	`333`	`### <a name="import">Importing Word2vec Models</a>`
`334`	`334`
`335`		`-The [Google News Corpus model](https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz) we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.`
	`335`	`+The [Google News Corpus model](https://github.com/mmihaltz/word2vec-GoogleNews-vectors) we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.`
`336`	`336`
`337`	`337`	`If you trained with the [C vectors](https://docs.google.com/file/d/0B7XkCwpI5KDYaDBDQm1tZGNDRHc/edit) or Gensimm, this line will import the model.`
`338`	`338`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 1553e62

File tree

2 files changed

2 files changed

`‎cn/word2vec.html`

`‎docs/_100-beta2/deeplearning4j-nlp-word2vec.md`

0 commit comments