How to process/run a list items parallel in spark?

问题

I am using spark-sql-2.4.1 version with java8 in my PoC.

I have following student data there standard/class wise as below

public static class Student implements Serializable {
            private String className;
            private String studentName;
            private Integer paperOneMarks;
            private Integer paperTwoMarks;
            private Integer paperThreeMarks;
            private Integer paperFourMarks;

            public Student(String className, String studentName, Integer paperOneMarks, Integer paperTwoMarks,
                    Integer paperThreeMarks, Integer paperFourMarks) {
                super();
                this.className = className;
                this.studentName = studentName;
                this.paperOneMarks = paperOneMarks;
                this.paperTwoMarks = paperTwoMarks;
                this.paperThreeMarks = paperThreeMarks;
                this.paperFourMarks = paperFourMarks;
            }

            public String getClassName() {
                return className;
            }
            public void setClassName(String className) {
                this.className = className;
            }
            public String getStudentName() {
                return studentName;
            }
            public void setStudentName(String studentName) {
                this.studentName = studentName;
            }
            public Integer getPaperOneMarks() {
                return paperOneMarks;
            }
            public void setPaperOneMarks(Integer paperOneMarks) {
                this.paperOneMarks = paperOneMarks;
            }
            public Integer getPaperTwoMarks() {
                return paperTwoMarks;
            }
            public void setPaperTwoMarks(Integer paperTwoMarks) {
                this.paperTwoMarks = paperTwoMarks;
            }
            public Integer getPaperThreeMarks() {
                return paperThreeMarks;
            }
            public void setPaperThreeMarks(Integer paperThreeMarks) {
                this.paperThreeMarks = paperThreeMarks;
            }
            public Integer getPaperFourMarks() {
                return paperFourMarks;
            }
            public void setPaperFourMarks(Integer paperFourMarks) {
                this.paperFourMarks = paperFourMarks;
            }


        }





                List<Student> data = Arrays.asList(
                        new Student("4th-Class", "Kiran", 23, 19, 26, 22),
                        new Student("4th-Class", "Peter", 32, 28, 21, 31),
                        new Student("4th-Class", "John", 21, 27, 26, 33),
                        new Student("4th-Class", "Alex", 17, 28, 25, 34),

                        new Student("3rd-Class", "Tony", 32, 17, 26, 22),
                        new Student("3rd-Class", "Fred", 19, 30, 25, 34),
                        new Student("3rd-Class", "Danny", 27, 28, 31, 30),
                        new Student("3rd-Class", "Sunny", 30, 31, 26, 21),

                        new Student("2nd-Class", "Stella", 19, 23, 22, 30),
                        new Student("2nd-Class", "Diya", 33, 28, 26, 17),
                        new Student("2nd-Class", "Amber", 32, 17, 25, 21),
                        new Student("2nd-Class", "Tanvish", 27, 28, 33, 23),
                        new Student("2nd-Class", "April", 32, 22, 26, 34),

                        new Student("1st-Class", "Maria", 27, 28, 22, 34),
                        new Student("1st-Class", "Justin", 30, 31, 19, 23),
                        new Student("1st-Class", "Peter", 32, 28, 18, 34),
                        new Student("1st-Class", "Anny", 22, 25, 26, 21),
                        new Student("1st-Class", "Kim", 19, 28, 32, 30),
                        new Student("1st-Class", "Akio", 17, 33, 26, 27)

                        );



  Encoder<Student> dataEncoder = Encoders.bean(Student.class);
  Dataset<Student> ds = spark.createDataset(data, dataEncoder);

I will get a list of classes/classNames i.e. "2nd-Class","3rd-Class" for each calls I need fight 1st ranker ? this is sample data but tomorrow I will get hundreds of classes/Names i.e. for each Institute . Hence I need to run this parallelly.

How to run/calculate this parallel ?? i.e. many in one go. how to do this ?

来源：https://stackoverflow.com/questions/60223691/how-to-process-run-a-list-items-parallel-in-spark

标签

dataframe

apache-spark

spark-streaming