How to execute multiple queries in parallel instead of sequentially?

前端 未结 2 761
南笙
南笙 2021-01-07 09:51

I am querying all my 10 tables to get the user id from them and loading all the user id\'s into HashSet so that I can have unique user id.

As of now it is sequential

相关标签:
2条回答
  • 2021-01-07 10:16

    You may be able to make it multithreaded but with the overhead of thread creation and multiple connections, you probably won't have significant benefit. Instead, use a UNION statement in mysql and get them all at once. Let the database engine figure out how to get them all efficiently:

    String sql = "select user_id from testkeyspace.test_table_1 UNION select  user_id from testkeyspace.test_table_2 UNION select user_id from testkeyspace.test_table_3 ...."
    

    Of course, you'll have to programatically create the sql query string. Don't actually put "...." in your query.

    0 讨论(0)
  • 2021-01-07 10:24

    If you're able to use Java 8, you could probably do this using parallelStream against a list of the tables, and use a lambda to expand the table name into the corresponding list of unique IDs per table, then join the results together into a single hash.

    Without Java 8, I'd use Google Guava's listenable futures and an executor service something like this:

    public static Set<String> fetchFromTable(int table) {
        String sql = "select * from testkeyspace.test_table_" + table + ";";
        Set<String> result = new HashSet<String>();
        // populate result with your SQL statements
        // ...
        return result;
    }
    
    public static Set<String> fetchFromAllTables() throws InterruptedException, ExecutionException {
        // Create a ListeningExecutorService (Guava) by wrapping a 
        // normal ExecutorService (Java) 
        ListeningExecutorService executor = 
                MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());
    
        List<ListenableFuture<Set<String>>> list = 
                new ArrayList<ListenableFuture<Set<String>>>(); 
        // For each table, create an independent thread that will 
        // query just that table and return a set of user IDs from it
        for (int i = 0; i < 10; i++) {
            final int table = i;
            ListenableFuture<Set<String>> future = executor.submit(new Callable<Set<String>>() {
                public Set<String> call() throws Exception {
                    return fetchFromTable(table);
                }
            });
            // Add the future to the list
            list.add(future);
        }
        // We want to know when ALL the threads have completed, 
        // so we use a Guava function to turn a list of ListenableFutures
        // into a single ListenableFuture
        ListenableFuture<List<Set<String>>> combinedFutures = Futures.allAsList(list);
    
        // The get on the combined ListenableFuture will now block until 
        // ALL the individual threads have completed work.
        List<Set<String>> tableSets = combinedFutures.get();
    
        // Now all we have to do is combine the individual sets into a
        // single result
        Set<String> userList = new HashSet<String>();
        for (Set<String> tableSet: tableSets) {
            userList.addAll(tableSet);
        }
    
        return userList;
    }
    

    The use of Executors and Futures is all core Java. The only thing Guava does is let me turn Futures into ListenableFutures. See here for a discussion of why the latter is better.

    There are probably still ways to improve the parallelism of this approach, but if the bulk of your time is being spent in waiting for the DB to respond or in processing network traffic, then this approach may help.

    0 讨论(0)
提交回复
热议问题