"instead of lecturing about SVD I want to show you how things work --step by step"
-- 如果大家认同这句话的话,Dr. E. Garcia写的此教程就是最适合你阅读的LSI / LSA教程。
原文比较长,直接贴链接了:
http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html
若觉得原文太长,还可以看Garcia写的精简版:
Latent Semantic Indexing (LSI) Fast Track Tutorial
Singular Value Decomposition (SVD) Fast Track Tutorial
摘录部分内容:
一、常见的对LSI的不正确认识:
1) is theming (analysis of themes).
2) is used by search engines to find all the nouns and verbs, and then associate them with related (substitution-useful) nouns and verbs.
3) allows search engines to "learn" which words are related and which noun concepts relate to one another.
4) is a form of on-topic analysis (term scope/subject analysis).can be applied to collections of any size.
5) has no problem addressing polysemy (terms with different meanings).
Pasted from <http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html>
二、LSI本质上识别了以文档为单位的second-order co-ocurrence的单词并归入同一个子空间。因此:
1)落在同一子空间的单词不一定是同义词,甚至不一定是在同情景下出现的单词,对于长篇文档尤其如是。
2)LSI根本无法处理一词多义的单词(多义词),多义词会导致LSI效果变差。
A persistent myth in search marketing circles is that LSI grants contextuality; i.e., terms occurring in the same context. This is not always the case. Consider two documents X and Y and three terms A, B and C and wherein:
A and B do not co-occur.
X mentions terms A and C
Y mentions terms B and C.
:. A---C---B
The common denominator is C, so we define this relation as an in-transit co-occurrence since both A and B occur while in transit with C. This is called second-order co-occurrence and is a special case of high-order co-occurrence.
However, only because terms A and B are in-transit with C this does not grant contextuality, as the terms can be mentioned in different contexts in documents X and Y. For example, this would be the case of X and Y discussing different topics. Long documents are more prone to this.
Even if X and Y are monotopic thesemight be discussing different subjects. Thus, it would be fallacious to assume that high-order co-occurrence between A and B while in-transit with C equates to a contextuality relationship between terms. Add polysemy to this and the scenario worsens, as LSI can fail to address polysemy.
Pasted from <http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html>
相关推荐
SVD and LSI Tutorial 4: Latent Semantic Indexing (LSI) How-to Calculations 一篇推导了动态加入查询语句公式的LSA教程
LSA(latent semantic analysis)潜在语义分析,也被称为 LSI(latent semantic index),是 Scott Deerwester, Susan T. Dumais 等人在 1990 年提出来的一种新的索引和检索方法。该方法和传 统向量空间模型(vector ...
numpy算法复现lsa算法内含数据集,潜在语义分析(Latent Semantic Analysis,LSA)模型, 也称LSI( Latent Semantic Indexing)
因此,这项工作提供了 LSI 算法的顺序版本(SLSI)。 它与现有算法的主要区别在于空间的维度不是固定的,而是动态变化的确保矩阵的给定水平的相对近似误差观察。 对真实文本集合的实验表明, SLSI 算法可以看作是一种...
本文总结了信息检索(IR)的主要技术,主要内容分成两部分:第...第二部分,介绍一些试图引入语义信息的新的IR方法(如自然语言处理(NLP)、隐性语义标引(latent semantic indexing, LSI)、神经网络(neural network)等等)。
不错的文章,值得一看,关于LSI的一个应用
LSI最新阵列配置教程~上下集都有~需要的联系
个人制作RAID教程,大量图片详细的说明RAID的安装与修复
LSI RAID配置/修复教程手册,非常实用详细的。希望对您有帮助。
潜在语义索引(Latent Semantic Indexing, LSI)余弦相 似度计算法,分别计算得出 MD&A文本相似度。 [3] 正面/负面词汇数量:个数 [4] 词汇/句子/文字总量:个数 [5] 情感语调1:(正面词汇数量-负面词 汇数量)/...
lsi 3041E 阵列设置教程, 教程一步步拍照截图上传, 值得珍藏
Lsi raid 配置手册 硬盘恢复 热备份教程 模拟器 LSI RAID BIOS界面下如何做硬盘修复.doc LSI RAID卡配置热备硬盘.doc LSI RAID配置手册(图文).doc Lsi.exe (模拟器)
LSI,IBM,DELL MSM MegaRAID Storage Manager 安装使用教程,依据系统IP远程管理LSI 阵列卡,不用跑机房一台一台查看,除系统所在硬盘外,可以随意创建RAID
LSI 8708E RAID卡创建raid1教程
LSI 阵列卡windo系统管理软件,Windows_LSI-MegaRAID_Storage_Manager LSI 阵列卡windo系统管理软件,Windows_LSI-MegaRAID_Storage_Manager LSI 阵列卡windo系统管理软件,Windows_LSI-MegaRAID_Storage_Manager ...
LSI RAID配置手册(图文) LSI RAID配置手册(图文)
LSI_9280_配置手册_MR_SAS_SW_UG_80-00156-01_Rev
LSI阵列卡配置详细教程,图文并茂,希望对大家有帮助,谢谢!
这个快速教程提供了为查询和文档打分与使用SVD(奇异值分解)和term count model来...LSI教程系列在下面的网址下描述:http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html
lsi raid SCG_LSISAS2008_PB_043009.pdf