Suppose I am using Apache spark to read a dataset like this:
City | Region | Population A A1 150000 A A2 50000 B B1 250000 C