I want to make my own dataset when doing translation in NLP. For example, x = ["It is an apple"] y = ["It is a pear"]. How show I make a dataset which can fi