问题
The count changes for every run for the same set of files. The following code is still not data consistent. How to make thread safe? Simple word count code.
package ConcurrentHashMapDemo;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
class FileReaderTask implements Runnable {
private String filePath;
private String fileName;
private ConcurrentMap<String, Integer> wordCountMap;
public FileReaderTask(String filePath, String fileName,
ConcurrentMap<String, Integer> wordCountMap) {
this.filePath = filePath;
this.fileName = fileName;
this.wordCountMap = wordCountMap;
}
public void run() {
File jobFile = new File(filePath + fileName);
try {
BufferedReader bReader = new BufferedReader(new FileReader(jobFile));
String line = "";
while ((line = bReader.readLine()) != null) {
String[] strArray = line.split(" ");
for (String str : strArray) {
if (wordCountMap.containsKey(str)) {
wordCountMap.replace (str.trim(),
wordCountMap.get(str.trim()) + 1);
} else {
wordCountMap.putIfAbsent(str.trim(), 1);
}
}
}
//Thread.sleep(10000);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public class Main {
public static void main(String[] args) {
ConcurrentMap<String, Integer> wordCountMap = new ConcurrentHashMap<String, Integer>();
File fileDir = new File("c://job_files");
Thread[] threads = new Thread[fileDir.listFiles().length];
for(int i=0;i<threads.length;i++){
FileReaderTask frt = new FileReaderTask("c:/job_files/", fileDir.listFiles()[i].getName(), wordCountMap);
threads[i]= new Thread(frt);
threads[i].start();
}
//
for(int i=0;i<threads.length;i++){
try {
threads[i].join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
for(Map.Entry<String, Integer> entry: wordCountMap.entrySet()){
String key = entry.getKey();
System.out.println(key +" - - "+wordCountMap.get(key));
}
System.out.println("Main");
}
}
回答1:
The concurrent containers ensure internal consistency (for example not adding the same key twice), but they do nothing to protect the stored values. Your code as it stands now has a race condition. Another thread can increment the counter between your call to get
and your call to replace
. The replace
then puts the wrong value in the map, losing the increment performed by the other thread.
You need to make your increment atomic. Something like this, which uses the version of replace
which ensures the value in the map is still the same before peforming the replacement:
str = str.trim();
while(true) {
Integer oldValue = wordCountMap.putIfAbsent(str, 1);
if(oldValue != null) {
if(wordCountMap.replace(str, oldValue, oldValue + 1))
break; // Successfully incremented the existing count
} else {
break; // Added new count of 1
}
}
来源:https://stackoverflow.com/questions/30937750/data-inconsistency-using-concurrenthashmap