Dropping the first and last row of an RDD with Spark
问题 I'm reading in a text file using spark with sc.textFile(fileLocation) and need to be able to quickly drop the first and last row (they could be a header or trailer). I've found good ways of returning the first and last row, but no good one for removing them. Is this possible? 回答1: One way of doing this would be to zipWithIndex , and then filter out the records with indices 0 and count - 1 : // We're going to perform multiple actions on this RDD, // so it's usually better to cache it so we don