I have the following data frame:
spark.sql(""" SELECT id, color, cnt FROM ( VALUES (\'A\',\'green\', 5), (\'A\',\'yellow\', 4),