How to convert an 500GB SQL table into Apache Parquet?

前端 未结 2 1935
暖寄归人
暖寄归人 2021-02-05 21:56

Perhaps this is well documented, but I am getting very confused how to do this (there are many Apache tools).

When I create an SQL table, I create the table using the fo

2条回答
  •  你的背包
    2021-02-05 22:08

    Apache Spark can be used to do this:

    1.load your table from mysql via jdbc
    2.save it as a parquet file
    

    Example:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()
    df = spark.read.jdbc("YOUR_MYSQL_JDBC_CONN_STRING",  "YOUR_TABLE",properties={"user": "YOUR_USER", "password": "YOUR_PASSWORD"})
    df.write.parquet("YOUR_HDFS_FILE")
    

提交回复
热议问题