I\'m interested in advice/pseudocode code/explanation rather than actual implementation.
something like that.
UPD: and concatenate final list in order to get final xpath. don't think attributes will be a problem.
I did the exact same thing last week for processing my xml to solr compliant format.
Since you wanted a pseudo code: This is how I accomplished that.
// You can skip the reference to parent and child.
1_ Initialize a custom node object: NodeObjectVO {String nodeName, String path, List attr, NodeObjectVO parent, List child}
2_ Create an empty list
3_ Create a dom representation of xml and iterate thro the node. For each node, get the corresponding information. All the information like Node name,attribute names and value should be readily available from dom object. ( You need to check the dom NodeType, code should ignore processing instruction and plain text nodes.)
// Code Bloat warning. 4_ The only tricky part is get path. I created an iterative utility method to get the xpath string from NodeElement. (While(node.Parent != null ) { path+=node.parent.nodeName}.
(You can also achieve this by maintaining a global path variable, that keeps track of the parent path for each iteration.)
5_ In the setter method of setAttributes (List), I will append the object's path with all the available attributes. (one path with all available attributes. Not a list of path with each possible combination of attributes. You might want to do someother way. )
6_ Add the NodeObjectVO to the list.
7_ Now we have a flat (not hierrarchial) list of custom Node Objects, that have all the information I need.
(Note: Like I mentioned, I maintain parent child relationship, you should probably skip that part. There is a possibility of code bloating, especially while getparentpath. For small xml this was not a problem, but this is a concern for large xml).