Spark-Xml: Array within an Array in Dataframe to generate XML

走远了吗. 提交于 2019-12-11 14:56:14

问题


I have a requirement to generate a XML which has a below structure

<parent>
	<name>parent</name
    <childs>
	<child>
		<name>child1</name>
	</child>
	<child>
		<name>child1</name>
        <grandchilds>
		<grandchild>
			<name>grand1</name>
		</grandchild>
		<grandchild>
			<name>grand2</name>
		</grandchild>
		<grandchild>
			<name>grand3</name>
		</grandchild>
      </grandchilds>
	</child>
	<child>
		<name>child1</name>
	</child>
  </childs>
</parent>	

As you see a parent will have child(s) and a child node may have grandchild(s) nodes.

https://github.com/databricks/spark-xml#conversion-from-dataframe-to-xml

I understand from spark-xml that when we have an nested array structure the data-frame should be as below

+------------------------------------+
|                                   a|
+------------------------------------+
|[WrappedArray(aa), WrappedArray(bb)]|
+------------------------------------+

Can you please help me with this small example on how to make a flattened DataFrame for my desired xml. I am working on Spark 2.X Spark-Xml 0.4.5(Latest)


My Schema

StructType categoryMapSchema = new StructType(new StructField[]{
          new StructField("name", DataTypes.StringType, true, Metadata.empty()),
          new StructField("childs", new StructType(new StructField[]{
              new StructField("child",
                  DataTypes.createArrayType(new StructType(new StructField[]{
                      new StructField("name", DataTypes.StringType, true,          Metadata.empty()),
                      new StructField("grandchilds", new StructType(new StructField[]{
                          new StructField("grandchild",
                              DataTypes.createArrayType(new StructType(new StructField[]{
                                  new StructField("name", DataTypes.StringType, true,
                                      Metadata.empty())
                              })), true, Metadata.empty())
                      }), true, Metadata.empty())
                  })), true, Metadata.empty())
          }), true, Metadata.empty()),
      });

My Row RDD data.. Not actual code, but somewhat like this.

final JavaRDD<Row> rowRdd = mapAttributes
      .map(parent -> {
        return RowFactory.create(
        parent.getParentName(),
        RowFactory.create(RowFactory.create((Object) parent.getChild))
        );

      });

What i have tried till now i have the WrappedArray within parent WrappedArray which does not work.

来源:https://stackoverflow.com/questions/50007809/spark-xml-array-within-an-array-in-dataframe-to-generate-xml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!