问题
I am writing unit tests for a Spark job, and some of the outputs are named tuples: pyspark.sql.Row
How can I assert their equality?
actual = get_data(df)
expected = Row(total=4, unique_ids=2)
self.assertEqual(actual, expected)
When I do this, the values are rearranged in an order I can not determine.
回答1:
Your code should work as written because according to the docs:
the fields will be sorted by names.
Nevertheless, another way is to use the asDict() method of the pySpark.sql.Row
and compare them as dictionaries:
actual = get_data(df)
expected = Row(total=4, unique_ids=2)
self.assertEqual(actual.asDict(), expected.asDict())
来源:https://stackoverflow.com/questions/49519475/check-if-two-pyspark-rows-are-equal