Perhaps this is well documented, but I am getting very confused how to do this (there are many Apache tools).
When I create an SQL table, I create the table using the fo
Apache Spark can be used to do this:
1.load your table from mysql via jdbc
2.save it as a parquet file
Example:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.jdbc("YOUR_MYSQL_JDBC_CONN_STRING", "YOUR_TABLE",properties={"user": "YOUR_USER", "password": "YOUR_PASSWORD"})
df.write.parquet("YOUR_HDFS_FILE")