I am confused as to how the Trie implementation saves space & stores data in most compact form!
If you look at the tree below. When you store a character at any nod
Space is saved when you've lots of words to be represented by the tree. Because many words share the same path in the tree; the more words you've, more space you would save.
But there is a better data structure if you want to save space. Trie doesn't save space as much as directed acyclic word graph (DAWG) does, because it shares common node throughout the structure, whereas trie doesn't share nodes. The wiki entry explains this much detail, so have a look at it.
Here is the difference (graphically) between Trie and DAWG:
The strings "tap", "taps", "top", and "tops" stored in a Trie (left) and a DAWG (right), EOW stands for End-of-word.
The tree on the left side is Trie, and the tree on the right is DAWG. Compare them and see how DAWG saves space effciently. Trie has duplicate nodes that represent same letter/subword, while DAWG has exactly one node for each letter/subword.