Pretraining a language model on a small custom corpus
问题 I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre-trained BERT model and a small corpus of medical (or any "type") text, make a language model that is able to generate medical text. The assumption is that you do not have a huge amount of "medical texts" and that is why you have to use transfer learning. Putting it as a pipeline, I would describe this as: Using a pre-trained BERT