Need to transpose a LARGE csv file in perl [closed]

问题

The csv data file is 3.2 GB in total, with god knows how many rows and columns (assume very large). The file is a genomics data with SNP data for a population of individuals. Thus the csv file contains IDs such as TD102230 and genetic data such as A/A and A/T.

Now that I used Text::CSV and Array::Transpose modules but couldn't seem to get it right (as in the computing cluster froze). Is there specific module that would do this? I am new to Perl (not much experience in low level programming, mostly used R and MATLAB before) so detailed explanations especially welcome!

回答1:

As direct answer, you should read file line by line, process them with Text::CSV, push new values to arrays with each array corresponds to original column and then just output them with join or like to get transposed representation of original. Disposing of each array right after join will help with memory problem too.

Writing values to external files instead of array and joining them with OS facilities is another way around memory requirements.

You also should think about why you need this. Is there really no better way to solve real task at hand, since transposing just by itself serves no real purpose?

回答2:

Break down the task into several steps to save memory.

Read a line and write the fields into a file named after the line number. Output one line per field.
Repeat step 1 until the input CSV file is exhausted.
Use paste to merge all output files into a big one.

来源：https://stackoverflow.com/questions/11832625/need-to-transpose-a-large-csv-file-in-perl

标签

perl

csv

large-files

transpose