发表新帖

发表新帖

How to use Hadoop InputFormats In Apache Spark?

后端未结

关注

 2  1636

难免孤独 2021-02-20 06:13

I have a class ImageInputFormat in Hadoop which reads images from HDFS. How to use my InputFormat in Spark?

Here is my ImageInputFormat:

<

2条回答

眼角桃花 (楼主)

2021-02-20 07:07

images all be stored in hadoopRDD ?

yes, everything that will be saved in spark is as rdds

can set the RDD capacity and when the RDD is full, the rest data will be stored in disk?

Default storage level in spark is (StorageLevel.MEMORY_ONLY) ,use MEMORY_ONLY_SER, which is more space efficient. Please refer spark documentation > scala programming > RDD persistance

Will the performance be influenced if the data is too big?

As data size increases , it will effect performance too.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题