1、什么是倒排索引。

e>>>(⊙o⊙)… 这是我见过最垃圾的翻译了，完全让人误解他的意思。

这个名称很容易让人理解为从A-Z的排序颠倒成Z-A，其实根本不是这么回事。

英文原版为 inverted index 个人感觉翻译成 反向索引 比较合适。

倒排索引是区别于正排索引(forward index)来说的。

解释：

文档是有许多的单词组成的，其中每个单词也可以在同一个文档中重复出现很多次，当然，同一个单词也可以出现在不同的文档中。

正排索引(forward index)：从文档角度看其中的单词，表示每个文档（用文档ID标识）都含有哪些单词，以及每个单词出现了多少次（词频）及其出现位置（相对于文档首部的偏移量）。

倒排索引(inverted index，或inverted files)：从单词角度看文档，标识每个单词分别在那些文档中出现(文档ID)，以及在各自的文档中每个单词分别出现了多少次（词频）及其出现位置（相对于该文档首部的偏移量）。

简单记为：
正排索引：文档 ---> 单词 常规的索引是文档到关键词的映射
倒排索引：单词 ---> 文档 倒排索引是关键词到文档的映射

应用场景：

倒排索引有着广泛的应用场景，比如搜索引擎、大规模数据库索引、文档检索、多媒体检索/信息检索领域等等。总之，倒排索引在检索领域是很重要的一种索引机制。

2、inverted index 的java实现

假设有3篇文章，file1, file2, file3，文件内容如下：

那么建立的倒排索引就是这个样子：

下面是对于倒排索引的一个简单的实现。该程序对于输入的一段文字，查找出该词所出现的行号以及出现的次数。

import java.io.*;  
import java.util.HashMap;  
import java.util.Map;  
  
  
public class InvertedIndex {  
      
    private Map<String, Map<Integer, Integer>> index;  
    private Map<Integer, Integer> subIndex;  
      
    public void createIndex(String filePath) {  
        index = new HashMap<String, Map<Integer, Integer>>();  
  
        try {  
            File file = new File(filePath);  
            InputStream is = new FileInputStream(file);  
            BufferedReader read = new BufferedReader(new InputStreamReader(is));  
              
            String temp = null;  
            int line = 1;  
            while ((temp = read.readLine()) != null) {  
                String[] words = temp.split(" ");  
                for (String word : words) {  
                    if (!index.containsKey(word)) {  
                        subIndex = new HashMap<Integer, Integer>();  
                        subIndex.put(line, 1);  
                        index.put(word, subIndex);  
                    } else {  
                        subIndex = index.get(word);  
                        if (subIndex.containsKey(line)) {  
                            int count = subIndex.get(line);  
                            subIndex.put(line, count+1);  
                        } else {  
                            subIndex.put(line, 1);  
                        }  
                    }  
                }  
                line++;  
            }  
            read.close();  
            is.close();  
        } catch (IOException e) {  
            System.out.println("error in read file");  
        }  
    }  
      
    public void find(String str) {  
        String[] words = str.split(" ");  
        for (String word : words) {  
            StringBuilder sb = new StringBuilder();  
            if (index.containsKey(word)) {  
                sb.append("word: " + word + " in ");  
                Map<Integer, Integer> temp = index.get(word);  
                for (Map.Entry<Integer, Integer> e : temp.entrySet()) {  
                    sb.append("line " + e.getKey() + " [" + e.getValue() + "] , ");   
                }  
            } else {  
                sb.append("word: " + word + " not found");  
            }  
            System.out.println(sb);  
        }  
    }  
      
    public static void main(String[] args) {  
        InvertedIndex index = new InvertedIndex();  
        index.createIndex("news.txt");  
        index.find("I love Shanghai today");  
    }  
}

其中，输入文件news.txt内容为：

I am eriol  
I live in Shanghai and I love Shanghai  
I also love travelling  
life in Shanghai  
is beautiful

输出结果为：

word: I in line 1 [1] , line 2 [2] , line 3 [1] ,   
word: love in line 2 [1] , line 3 [1] ,   
word: Shanghai in line 2 [2] , line 4 [1] ,   
word: today not found

参考来自！倒排索引简单实现

知乎：倒排索引为什么叫倒排索引？

另外的资源学习（本文并未涉及）

倒排索引的java实现

MapReduce实现倒排索引（类似协同过滤）

hadoop倒排索引

来源：oschina

链接：https://my.oschina.net/u/2885163/blog/1554246

标签

inverted index

倒排索引 inverted index

1、什么是倒排索引。

2、inverted index 的java实现

MapReduce实现倒排索引（类似协同过滤）

hadoop倒排索引