I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program \'Popoolution\' allows us to comm
The simplest solution is to use a stream-oriented editor such as sed. All you need is to be able to write one or more regular expression(s) that will identify all (and only) the bad records. Since you haven't provided any details on how to identify the bad records, this is the only possible answer.