I am new to python. I am trying to reduce a 2d rdd using pyspark based on the row number and mapped using the mean of the number of observations in a row. The rdd contains the n