问题
I have following query regarding Hazelcast Jet
The use-case as follows
There is one application (Application 'A', deployed in cluster) uses Hazelcast IMDG and puts millions of records / transactions in hazelcast IMap.
The Event Journal has been configured for this IMap.
There is another application (Application B, deployed in cluster) instantiates JetInstance and runs the job individually on each node to process the records.
Currently, this job reads data from event journal and adds into IList (Reference - hazelcast-jet-0.5.1\code-samples\streaming\map-journal-source\src\main\java\RemoteMapJournalSource.java)
As the job is running on multiple nodes,the records from Event Journal are processed by multiple nodes. This results in multiple entries in the IList.
Is it possible to ensure, a record is processed by only one node of the 'Application B' and not processed by other nodes to avoid duplicates ?
If not, does this mean the job would be run by single node of the 'Application B' cluster ?
Here is a sample code (Application B)
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<Integer, Integer, Integer>remoteMapJournal(MAP_NAME, clientConfig,
e -> e.getType() == EntryEventType.ADDED, EventJournalMapEvent::getNewValue, true))
.peek()
.drainTo(Sinks.list(SINK_NAME));
JobConfig jc= new JobConfig();
jc.setProcessingGuarantee(ProcessingGuarantee.EXACTLY_ONCE);
localJet.newJob(p,jc);
Here is a complete code.
Application A Source Code.
public class RemoteMapJournalSourceSrv1 {
private static final String MAP_NAME = "map";
private static final String SINK_NAME = "list";
public static void main(String[] args) throws Exception {
System.setProperty("remoteHz.logging.type", "log4j");
Config hzConfig = getConfig();
HazelcastInstance remoteHz = startRemoteHzCluster(hzConfig);
try {
IMap<Integer, Integer> map = remoteHz.getMap(MAP_NAME);
System.out.println("*************** Initial Map address " + map.size() );
while(true) {
System.out.println("***************map size "+map.size());
TimeUnit.SECONDS.sleep(20);
}
} finally {
Hazelcast.shutdownAll();
}
}
private static HazelcastInstance startRemoteHzCluster(Config config) {
HazelcastInstance remoteHz = Hazelcast.newHazelcastInstance(config);
return remoteHz;
}
private static Config getConfig() {
Config config = new Config();
// Add an event journal config for map which has custom capacity of 1000 (default 10_000)
// and time to live seconds as 10 seconds (default 0 which means infinite)
config.addEventJournalConfig(new EventJournalConfig().setEnabled(true)
.setMapName(MAP_NAME)
.setCapacity(10000)
.setTimeToLiveSeconds(100));
return config;
}
Here is Application B - Node 1 Sample Code
public class RemoteMapJournalSourceCL1 {
private static final String MAP_NAME = "map";
private static final String SINK_NAME = "list";
public static void main(String[] args) throws Exception {
System.setProperty("remoteHz.logging.type", "log4j");
JetInstance localJet = startLocalJetCluster();
try {
ClientConfig clientConfig = new ClientConfig();
GroupConfig groupConfig = new GroupConfig();
clientConfig.getNetworkConfig().addAddress("localhost:5701");
clientConfig.setGroupConfig(groupConfig);
IList list1 = localJet.getList(SINK_NAME);
int size1 = list1.size();
System.out.println("***************List Initial size "+size1);
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<Integer, Integer, Integer>remoteMapJournal(MAP_NAME, clientConfig,
e -> e.getType() == EntryEventType.ADDED, EventJournalMapEvent::getNewValue, false))
.peek()
.drainTo(Sinks.list(SINK_NAME));
JobConfig jc= new JobConfig();
jc.setProcessingGuarantee(ProcessingGuarantee.EXACTLY_ONCE);
localJet.newJob(p,jc);
while(true){
TimeUnit.SECONDS.sleep(10);
System.out.println("***************Read " + list1.size() + " entries from remote map journal.");
}
} finally {
Hazelcast.shutdownAll();
Jet.shutdownAll();
}
}
private static String getAddress(HazelcastInstance remoteHz) {
Address address = remoteHz.getCluster().getLocalMember().getAddress();
System.out.println("***************Remote address " + address.getHost() + ":" + address.getPort() );
return address.getHost() + ":" + address.getPort();
}
private static JetInstance startLocalJetCluster() {
JetInstance localJet = Jet.newJetInstance();
return localJet;
}
Here is Application B - Node 2 Sample code
public class RemoteMapJournalSourceCL2 {
private static final String MAP_NAME = "map";
private static final String SINK_NAME = "list";
public static void main(String[] args) throws Exception {
System.setProperty("remoteHz.logging.type", "log4j");
JetInstance localJet = startLocalJetCluster();
try {
ClientConfig clientConfig = new ClientConfig();
GroupConfig groupConfig = new GroupConfig();
clientConfig.getNetworkConfig().addAddress("localhost:5701");
clientConfig.setGroupConfig(groupConfig);
IList list1 = localJet.getList(SINK_NAME);
int size1 = list1.size();
System.out.println("***************List Initial size "+size1);
Pipeline p = Pipeline.create();
p.drawFrom(Sources.<Integer, Integer, Integer>remoteMapJournal(MAP_NAME, clientConfig,
e -> e.getType() == EntryEventType.ADDED, EventJournalMapEvent::getNewValue, true))
.peek()
.drainTo(Sinks.list(SINK_NAME));
JobConfig jc= new JobConfig();
jc.setProcessingGuarantee(ProcessingGuarantee.EXACTLY_ONCE);
localJet.newJob(p,jc);
while(true){
TimeUnit.SECONDS.sleep(10);
System.out.println("***************Read " + list1.size() + " entries from remote map journal.");
}
} finally {
Hazelcast.shutdownAll();
Jet.shutdownAll();
}
}
private static JetInstance startLocalJetCluster() {
JetInstance localJet = Jet.newJetInstance();
return localJet;
}
Hazelcast Client - Puts entries in Hazelcast Map (Application A)
public class HZClient {
public static void main(String[] args) {
ClientConfig clientConfig = new ClientConfig();
GroupConfig groupConfig = new GroupConfig();
clientConfig.getNetworkConfig().addAddress("localhost:5701");
clientConfig.setGroupConfig(groupConfig);
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
IMap<Integer, Integer> map = client.getMap("map");
Scanner in = new Scanner(System.in);
int startIndex= 0;
int endIndex= 0;
while(true) {
if(args !=null && args.length > 0 && args[0].equals("BATCH")) {
System.out.println("Please input the batch size");
int b = in.nextInt();
startIndex= endIndex + 1;
endIndex+= b;
System.out.println("Batch starts from "+ startIndex +"ends at"+endIndex);
putBatch(map,startIndex,endIndex);
}
else {
System.out.println("Please input the map entry");
int a = in.nextInt();
System.out.println("You entered integer "+a);
put(map,a,a);
}
}
}
public static void putBatch(IMap map,int startIndex, int endIndex) {
int index= startIndex;
System.out.println("Start Index" + startIndex +"End Index"+endIndex );
while(index<=endIndex){
System.out.println("Map Values"+ index);
put(map,index,index);
index+=1;
}
}
public static void put(IMap map,int key,int value) {
map.set(key, value);
}
Here are the steps to execute this.
Run Application A - Java program RemoteMapJournalSourceSrv1
Run Application B Node 1 - Java program RemoteMapJournalSourceCL1
Run Application B Node 2 - Java program RemoteMapJournalSourceCL2
Run Hazelcast Client for Application A - Java program HZClient
This client program puts entries into the map based on console input. Please provide integer input.
Observations
On execution, the .peek() logs values for both nodes of Application B and the list count becomes 2 on insertion of 1 entry in the Application A map.
回答1:
It appears that you are submitting two independent jobs from two Jet clients. Each job receives all the IMap event journal items and pushes them to the same IList, therefore the expected outcome is for the IList to contain two instances of each item.
Remember that you only submit the job from a Jet client, but it actually runs inside the Jet cluster, on all its members simultaneously. Do not submit the same job twice if you want just one copy of the data in the sink.
来源:https://stackoverflow.com/questions/49747170/hazelcast-jet-query