Java what's the best data structure to search objects by keywords [closed]

好久不见. 提交于 2019-12-05 10:10:31

问题


suppose I have a "journal article" class which has variables such as year, author(s), title, journal name, keyword(s), etc.

variables such as authors and keywords might be declared as String[] authors and String[] keywords

What's the best data structure to search among a group of objects of "journal paper" by one or several "keywords", or one of several author names, or part of the title?

Thanks!

========================================================================== Following everybody's help, the test code realized via the Processing environment is shown below. Advices are greatly appreciated! Thanks!

ArrayList<Paper> papers = new ArrayList<Paper>();

HashMap<String, ArrayList<Paper>> hm = new HashMap<String, ArrayList<Paper>>();

void setup(){
  Paper paperA = new Paper();
  paperA.title = "paperA";
  paperA.keywords.append("cat");
  paperA.keywords.append("dog");
  paperA.keywords.append("egg");
  //println(paperA.keywords);
  papers.add(paperA);

  Paper paperC = new Paper();
  paperC.title = "paperC";
  paperC.keywords.append("egg");
  paperC.keywords.append("cat");
  //println(paperC.keywords);
  papers.add(paperC);

  Paper paperB = new Paper();
  paperB.title = "paperB";
  paperB.keywords.append("dog");
  paperB.keywords.append("egg");
  //println(paperB.keywords); 
  papers.add(paperB);

  for (Paper p : papers) {
    // get a list of keywords for the current paper
    StringList keywords = p.keywords;

    // go through each keyword of the current paper
    for (int i=0; i<keywords.size(); i++) {
      String keyword = keywords.get(i);

      if ( hm.containsKey(keyword) ) { 
        // if the hashmap has this keyword
        // get the current paper list associated with this keyword
        // which is the "value" of this keyword
        ArrayList<Paper> papers = hm.get(keyword);        
        papers.add(p); // add the current paper to the paper list        
        hm.put(keyword, papers); // put the keyword and its paper list back to hashmap
      } else { 
        // if the hashmap doesn't have this keyword
        // create a new Arraylist to store the papers with this keyword
        ArrayList<Paper> papers = new ArrayList<Paper>();        
        papers.add(p); // add the current paper to this ArrayList        
        hm.put(keyword, papers); // put this new keyword and its paper list to hashmap
      }
    }

  }

  ArrayList<Paper> paperList = new ArrayList<Paper>();
  paperList = hm.get("egg");
  for (Paper p : paperList) {
    println(p.title);
  }
}

void draw(){}

class Paper 
{
  //===== variables =====
  int ID;
  int year;
  String title;
  StringList authors  = new StringList();
  StringList keywords = new StringList();
  String DOI;
  String typeOfRef;
  String nameOfSource;
  String abs; // abstract


  //===== constructor =====

  //===== update =====

  //===== display =====
}

回答1:


Use a HashMap<String, JournalArticle> data structure.

for example

Map<String, JournalArticle> journals = new HashMap<String, JournalArticle>();
journals.put("keyword1", testJA);

if (journals.containsKey("keyword1")
{
    return journals.get("keyword1");
}

you can put your keywords as the key of String type in this map, however, it only supports "exact-match" kind of search, meaning that you have to use the keyword (stored as key in the Hashmap) in your search.

If you are looking for " like " kind of search, I suggest you save your objects in a database that supports queries for "like".

Edit: on a second thought, I think you can do some-kind-of "like" queries (just like the like clause in SQL), but the efficiency is not going to be too good, because you are iterating through all the keys in the HashMap whenever you do a query. If you know regex, you can do all kinds of queries with modification of the following example code (e.g. key.matches(pattern)):

    List<JournalArticle> results = null;

    for (String key : journals.keySet())
    {
        if (key.contains("keyword"))  /* keyword has to be part of the key stored in the HashMap, but does not have to be an exact match any more */
            results.add(journals.get(key));
    }

    return results;



回答2:


For simple cases you can use a Multimap<String, Article>. There's one in Guava library.

For larger amounts of data Apache Lucene will be a better fit.




回答3:


I would create a map from a keyword (likewise for author, or title, etc.), to a set of JournalArticles.

Map<String, Set<JournalArticle>> keyWordMap = new HashMap<>();
Map<String, Set<JournalArticle>> authorMap = new HashMap<>();

When you create a new JournalArticle, for each of its key words, you'd add that article to the appropriate set.

JournalArticle ja = new  JournalArticle();
for(String keyWorld : ja.getKeyWords())
{
    if(keyWordMap.containsKey(keyWorld) == false)
        keyWordMap.put(keyWorld, new HashSet<JournalArticle>());
    keyWordMap.get(keyWorld).add(ja);
}

To do a look up, you'd do something like:

String keyWord = "....";
Set<JournalArticle> matchingSet = keyWordMap.get(keyWord);


来源:https://stackoverflow.com/questions/24414595/java-whats-the-best-data-structure-to-search-objects-by-keywords

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!