doc2vecjava实现_doc2vec 跑出的结果怎么聚类

① 如何在原有词向量的基础上fine-tune

② 如何通过词向量技术来计算2个文档的相似度

最近正好组内做了一个文档相似度的分享。决定回答一发。
首先，如果不局限于NN的方法，可以用BOW+tf-idf+LSI/LDA的体系搞定，也就是俗称的01或one hot representation。
其次，如果楼主指定了必须用流行的NN，俗称word-embedding的方法，当然首推word2vec（虽然不算是DNN）。然后得到了word2vec的词向量后，可以通过简单加权/tag加权/tf-idf加权等方式得到文档向量。这算是一种方法。当然，加权之前一般应该先干掉stop word，词聚类处理一下。
还有，doc2vec中的paragraph vector也属于直接得到doc向量的方法。特点就是修改了word2vec中的cbow和skip-gram模型。依据论文《Distributed Representations of Sentences and Documents》(ICML 2014)。
还有一种根据句法树加权的方式，是ICML2011提出的，见论文《Parsing Natural Scenes and Natural Language with Recursive Neural Networks》，后续也有多个改编的版本。
当然，得到词向量的方式不局限于word2vec，RNNLM和glove也能得到传说中高质量的词向量。
ICML2015的论文《From Word Embeddings To Document Distances, Kusner, Washington University》新提出一种计算doc相似度的方式，大致思路是将词之间的余弦距离作为ground distance，词频作为权重，在权重的约束条件下，求WMD的线性规划最优解。
最后，kaggle101中的一个word2vec题目的tutorial里作者如是说：他试了一下简单加权和各种加权，不管如何处理，效果还不如01，归其原因作者认为加权的方式丢失了最重要的句子结构信息（也可以说是词序信息），而doc2vec的方法则保存了这种信息。
在刚刚结束的ACL2015上，似乎很多人提到了glove的方法，其思想是挖掘词共现信息的内在含义，据说是基于全局统计的方法（LSI为代表）与基于局部预测的方法（word2vec为代表）的折衷，而且输出的词向量在词聚类任务上干掉了word2vec的结果，也可以看看。《GloVe: Global Vectors forWord Representation》

③ 如何用 word2vec 计算两个句子之间的相似度

word2vec这个代名词也好计算软件也好，对于一个不太懂软件的人来说真的是很陌生，也可以说是一窍不通，但是从朋友那了解了很多，所以我觉得计算两个句子之间的相似度我觉得定义句子相似度是这个问题的关键。

我觉得文档相似度取决于文档的长度,如果是一个简短的文本,传统方法tf-idf，相反如果是长文本，可以使用word2vec。

④ 如何通过词向量技术来计算2个文档的相似度

最近正好组内做了一个文档相似度的分享。决定回答一发。首先，如果不局限于NN的方法，可以用BOW+tf-idf+LSI/LDA的体系搞定，也就是俗称的01或one hot representation。其次，如果楼主指定了必须用流行的NN，俗称word-embedding的方法，当然首推word2vec（虽然不算是DNN）。然后得到了word2vec的词向量后，可以通过简单加权/tag加权/tf-idf加权等方式得到文档向量。这算是一种方法。当然，加权之前一般应该先干掉stop word，词聚类处理一下。还有，doc2vec中的paragraph vector也属于直接得到doc向量的方法。特点就是修改了word2vec中的cbow和skip-gram模型。依据论文《Distributed Representations of Sentences and Documents》(ICML 2014)。还有一种根据句法树加权的方式，是ICML2011提出的，见论文《Parsing Natural Scenes and Natural Language with Recursive Neural Networks》，后续也有多个改编的版本。当然，得到词向量的方式不局限于word2vec，RNNLM和glove也能得到传说中高质量的词向量。 ICML2015的论文《From Word Embeddings To Document Distances, Kusner, Washington University》新提出一种计算doc相似度的方式，大致思路是将词之间的余弦距离作为ground distance，词频作为权重，在权重的约束条件下，求WMD的线性规划最优解。最后，kaggle101中的一个word2vec题目的tutorial里作者如是说：他试了一下简单加权和各种加权，不管如何处理，效果还不如01，归其原因作者认为加权的方式丢失了最重要的句子结构信息（也可以说是词序信息），而doc2vec的方法则保存了这种信息。在刚刚结束的ACL2015上，似乎很多人提到了glove的方法，其思想是挖掘词共现信息的内在含义，据说是基于全局统计的方法（LSI为代表）与基于局部预测的方法（word2vec为代表）的折衷，而且输出的词向量在词聚类任务上干掉了word2vec的结果，也可以看看。《GloVe: Global Vectors forWord Representation》

⑤ Word2vec的词聚类结果与LDA的主题词聚类结果，有什么不同

所以Word2vec的一些比较精细的应用，LDA是做不了的。比如：
1）计算词的相似度。同样在电子产品这个主题下，“苹果”是更接近于“三星”还是“小米”？
2）词的类比关系：vector（小米）- vector（苹果）+ vector（乔布斯）近似于 vector（雷军）。
3）计算文章的相似度。这个LDA也能做但是效果不好。而用词向量，即使在文章topic接近的情况下，计算出的相似度也能体现相同、相似、相关的区别。
反过来说，想用词向量的聚类去得到topic这一级别的信息也是很难的。很有可能，“苹果”和“小米”被聚到了一类，而“乔布斯”和“雷军”则聚到另一类。
这种差别，本质上说是因为Word2vec利用的是词与上下文的共现，而LDA利用的是词与文章之间的共现。
PS. 说起来，拿LDA和doc2vec比较才比较合理啊～～

⑥ doc2vec 跑出的结果怎么聚类

K-means或者别的聚类算法啊（就是效果感觉不太好哎！）

⑦ java编一个计算器的代码

用Applet还是用Application?

我以前老师好像也布置过一个计算器是Applet的

import java.applet.Applet;
import java.awt.Button;
import java.awt.Color;
import java.awt.FlowLayout;
import java.awt.TextField;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.JTextField;
import javax.swing.SwingConstants;

//import com.sun.org.apache.xerces.internal.impl.xpath.regex.ParseException;

public class calculator extends Applet implements ActionListener {

/**
*
*/
//private static final long serialVersionUID = 3114597813287650283L;
/* (non-Javadoc)
* @see java.applet.Applet#init()
*/
public int operator = 1; //运算符判断：1－加，2－减，3－乘，4－除；
public double accumulator=0; //运算的累加器；
public double digit = 0;
public boolean bool = true;
public boolean boolca = false;
public String text="0.0";

/*窗体控件*/
public JTextField tf_AccepteData = new JTextField();
public Button bt_Digit;
public Button bt_dot = new Button(".");
public Button bt_add = new Button("+");
public Button bt_plus = new Button("-");
public Button bt_mul = new Button("*");
public Button bt_div = new Button("/");
public Button bt_equal = new Button("=");
public Button bt_clear = new Button("C");
public Button bt_inverse = new Button("1/x");
public Button bt_non = new Button("+/-");
public Button bt_sqrt = new Button("sqrt");

public double scan()
{
text = tf_AccepteData.getText();
try{
digit = Double.parseDouble(text);
}catch(Exception msg){
msg.printStackTrace();
}
return digit;
}
public void accumulate()
{
if(!tf_AccepteData.getText().equals("")){
switch(operator){
case 1:
accumulator += digit;
tf_AccepteData.setText("" + accumulator);
break;
case 2:
accumulator -= digit;
tf_AccepteData.setText("" + accumulator);
break;
case 3:
accumulator *= digit;
tf_AccepteData.setText("" + accumulator);
break;
case 4:
if(digit == 0){
tf_AccepteData.setText("除数不能为零");
operator = 1;
accumulator = 0;
bool = true;
digit = 0;
text="";
}else{
accumulator /=digit;
tf_AccepteData.setText("" + accumulator);
}
break;
default:
break;
}
operator = 1;
}
}

public void init() {
/*初始化窗体*/
setLayout(null);
setSize(225,200);
setBackground(Color.orange);
add(tf_AccepteData);
tf_AccepteData.setHorizontalAlignment(SwingConstants.RIGHT);
tf_AccepteData.setSize(200, 25);
tf_AccepteData.setLocation(10, 10);
tf_AccepteData.setEnabled(false);
tf_AccepteData.setText("0.0");
text = tf_AccepteData.getText();
for(int i= 0; i<=9; i++){
bt_Digit = new Button("" + i);
add(bt_Digit);
bt_Digit.setSize(30, 30);
if(i>=7&&i<=9)
bt_Digit.setLocation((15+40*(i-7)), 50);
if(i>=4&&i<=6)
bt_Digit.setLocation((15+40*(i-4)), 85);
if(i>=1&&i<=3)
bt_Digit.setLocation((15+40*(i-1)), 120);
if(i==0)
bt_Digit.setLocation(15, 155);
bt_Digit.addActionListener(this);
}
/*
* 小数点
*/
add(bt_dot);
bt_dot.setSize(30, 30);
bt_dot.setLocation(95, 155);
bt_dot.addActionListener(this/*new ActionListener()*/);/*{
public void actionPerformed(ActionEvent e) {
}
});*/
/*
* 加法运算
*/
add(bt_add);
bt_add.setSize(30, 30);
bt_add.setLocation(135, 155);
bt_add.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e) {

accumulate();
operator = 1;
bool =true;
boolca = true;
digit = 0;
}
});
/*
* 减法运算
*/
add(bt_plus);
bt_plus.setSize(30, 30);
bt_plus.setLocation(135, 120);
bt_plus.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e)
{
accumulate();
operator = 2;
bool =true;
boolca = true;
digit = 0;
}
});
/*
* 乘法运算
*/
add(bt_mul);
bt_mul.setSize(30, 30);
bt_mul.setLocation(135, 85);
bt_mul.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e)
{
accumulate();
operator = 3;
bool =true;
boolca = true;
digit = 1;
}
});
/*
* 除法运算
*/
add(bt_div);
bt_div.setSize(30, 30);
bt_div.setLocation(135, 50);
bt_div.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e)
{
accumulate();
operator = 4;
bool =true;
boolca = true;
digit = 1;
}
});
/*
* 倒数
*/
add(bt_inverse);
bt_inverse.setSize(30, 30);
bt_inverse.setLocation(175, 120);
bt_inverse.addActionListener(new ActionListener(){
double data=0;
public void actionPerformed(ActionEvent e) {
data = scan();
if(digit==0){
tf_AccepteData.setText("取倒分母不能为零!");
operator = 1;
accumulator = 0;
bool = true;
digit = 0;
text="";
}else{
tf_AccepteData.setText("" + 1.0/data);
}
if(!boolca)
operator = 1;
}
});
/*
* 数学平方根
*/
add(bt_sqrt);
bt_sqrt.setSize(30, 30);
bt_sqrt.setLocation(175, 85);
bt_sqrt.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e) {
scan();
if(digit<0){
tf_AccepteData.setText("函数输入无效!");
operator = 1;
accumulator = 0;
bool = true;
digit = 0;
text="";
}else{
digit = Math.sqrt(digit);
tf_AccepteData.setText("" + digit);
}
if(!boolca)
operator = 1;
}
});
/*
* 取反
*/
add(bt_non);
bt_non.setSize(30, 30);
bt_non.setLocation(55, 155);
bt_non.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e) {
try{
digit = -Double.parseDouble(tf_AccepteData.getText());
tf_AccepteData.setText("" + digit);
}catch(Exception msg){
msg.printStackTrace();
}
if(!boolca)
operator = 1;
}
});
/*
* 等于
*/
add(bt_equal);
bt_equal.setSize(30, 30);
bt_equal.setLocation(175, 155);
bt_equal.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e) {
accumulate();
operator = 1;
bool =true;
digit = 0;
}
});
/*
* 清零运算
*/
add(bt_clear);
bt_clear.setSize(30, 30);
bt_clear.setLocation(175, 50);
bt_clear.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent e) {
operator = 1;
accumulator = 0;
bool = true;
digit = 0;
text="";
tf_AccepteData.setText("0.0");
}
});
}
public void actionPerformed(ActionEvent e){
if(bool){
tf_AccepteData.setText("");
bool = false;
}
text = tf_AccepteData.getText();
if((text+e.getActionCommand()).indexOf(".")!=(text+e.getActionCommand()).lastIndexOf(".")) return;
text += e.getActionCommand();
if(text.equals("."))
text = "0"+text;
tf_AccepteData.setText(text);
scan();
}
}

还要写一个CalculatorTest.html网页文件代码如下，写好双击运行CalculatorTest.html就可以了



<HTML>
<HEAD>
<TITLE>
小应用程序——一个简单的计算器模型
</TITLE>
</HEAD>

<BODY bgcolor = blue text = yellow >
<center>
<marquee behavior=alternate width = 300 height = 40> 实现加、减、乘、除四种基本运算</marquee>
 
<APPLET ALIGN = middle CODE = "calculator.class" WIDTH = 225 HEIGHT = 200></APPLET>
 
<input type = button value = " 退出 " onClick="javascript:window.close()">
</center>
 
</BODY>
</HTML>

⑧ Doc2vec的优缺点是什么

优点很多比如

⑨ wordtune怎么用

wordtune：word可以插入图片，但是插入的gif是只能显示一帧。

doc2vec中的paragraph vector也属于直接得到doc向量的方法。特点就是修改了word2vec中的cbow和skip-gram模型。依据论文《Distributed Representations of Sentences and Documents》(ICML 2014)。

Microsoft office Word 2007：

这次发行包含了许多变化，包括新的支持XML的文件格式，重新设计的界面，集成的公式编辑器和书目管理。此外，提出了XML数据包，经过对象模型和文件格式可以得到的，称作自定义的XML——这可以利用在同一个新的被称作连接控制工具结构文档的特性的关联之中。

它也有只针对带有焦点的对象有特殊功能的内容标签，还有一些其他特性如Live Preview（通过它你可以在不作出任何永久变化时查看文件）、迷你工具栏，超级工具提示，快速访问工具栏SmartArt等。

⑩ 请问一个java中Vector类的问题

Vector<Token> vector=new Vector<Token>（）；

Token token1 = new Token();
Token token2 = new Token();

vector.add（token1）；
vector.add（token2）；

这样子尝试一下吧

取出的话

如果像上面那样的话

Token a = vector.get(0);
Token b = vector.get(1);

就可以了

导航:首页 > 编程语言 > doc2vecjava实现

doc2vecjava实现

与doc2vecjava实现相关的资料

友情链接