问题
I am exporting a table in Cloud Bigtable to Cloud Storage by following this link https://cloud.google.com/bigtable/docs/exporting-sequence-files#exporting_sequence_files_2
The bigtable table size is ~300GB and the dataflow pipeline results in this error
An OutOfMemoryException occurred. Consider specifying higher memory instances in PipelineOptions.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)...
and the error suggests to increase the memory of instance type used for the Dataflow job. I also received a warning saying
Worker machine type has insufficient disk (25 GB) to support this type of Dataflow job. Please increase the disk size given by the diskSizeGb/disk_size_gb execution parameter.
I re-checked the command to run the pipeline here (https://github.com/googleapis/cloud-bigtable-client/tree/master/bigtable-dataflow-parent/bigtable-beam-import) and tried to look for any command line option which helps me to set custom instance type or PD size for the instance but couldn't find any.
By default the instance type is n1-standard-1 and PD Size is 25GB.
Is there any parameter to pass during job creation which would help me to escape this error? If yes, what are they?
回答1:
I found the parameters to select custom PD size and instance type. It is
--diskSizeGb=[Disk_size_in_GBs] --workerMachineType=[GCP_VM_machine_type]
For my case I used
--diskSizeGb=100 --workerMachineType=n1-highmem-4
These parameters are part of PipelineOptions class for defining execution time parameters. You can refer more parameters here https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.html
But since I had set --maxNumWorkers to 30 for autoscaling I ran into some Quota issues which will prevent your job from autoscaling and will be slowed down but no errors.
来源:https://stackoverflow.com/questions/56467089/facing-outofmemoryexception-while-exporting-bigtable-tables-to-google-cloud-stor