I have a dataset for university degree types where the degrees are mentioned in a very inconsistent manner. In total there are about 12,654 unique degree types which will ta