Visualizing set hierarchies as color coded graphs

后端 未结 4 1042
情深已故
情深已故 2021-02-02 12:21

I have been reading quite a bit on graphing libraries for Java and Javascript lately but I haven\'t found a good way to do what I want to do.

Essentially I have a hiera

相关标签:
4条回答
  • 2021-02-02 12:45

    Have you considered a 2-dimensional grid:

    • Put the set number on one axis
    • Put the unique elements found in all sets on the other axis
    • Color each cell where an element is found in a set (by looking at that row and column's labels)

    While this visualization method would normally be inferior to some of the more complicated ones mentioned so far, it has the virtue of actually being possible when you have thousands of elements and thousands of sets.

    The trick will be to order the rows and columns in a way that puts the most information together in a way useful to the user. My instinct says that the problem you're trying to solve is to make the colored cells be as "bloblike" as possible—if each set of adjacent colored cells is called an "area", to have the least number of distinct areas and for them to have the fewest holes in them.

    That is a very complicated problem in its own right, but could be at least partially solved by working up some adjacency factors for each set against every other set. What you're looking for are "islands" of closeness--so start with the pair of most alike sets, add them to the graph, and consider them a region. Recalculate your closeness numbers with the region replacing the pair it holds (averaging in some way?). Find the next most close pair of items (each item being a region or a set), and if that pair is within a certain threshold of closeness to any existing region in the graph, attach to one side of that region, otherwise create a new, separate region (again removing the pair's closeness values and recomputing for the region itself). Eventually, all sets will be added to regions, and all regions will be joined. Joining two regions can have four possibilities (flipping may be required), so which sides to attach in the graph could be calculated by the closeness of the sets on the 4 edges of the two regions.

    While this may never give the optimal configuration, it should come up with something that has few regions compared to a random distribution.

    Finally, some dynamic reordering might be useful, by allowing the user to select an interesting set or element, and use that as the seed for a completely rearranged graph, calculating each addition based on closeness to that element (and subsequently that region after being combined with another element), rather than overall lowest closeness of any.

    Here is a diagram of the result, having done the above logic process on the example set of data in your question:

    Sets and Elements

    Deciding how to order the columns is complex, but basically you can get sort of reasonable results by moving columns to be adjacent when such a move won't disturb the colored block area of any already-added segments.

    Additional thoughts:

    • Calculating set closeness is not just how many elements they have in common, but also how many elements they have that are not in common. If two pairs of sets have 3 elements in common between the pairs, but one has 5 non-shared elements and the other has 3 non-shared elements, then the pair with 3 non-shared elements is a closer match than the other.
    • After adding a set to the graph, there is an opportunity to reorder the elements. Stacking the elements as leftmost as possible is a good start for the first placement. After that, stacking most common elements leftmost seems good. After that, it breaks down. I wonder if getting the colored cells as close to the diagonal (from top left to bottom right) would also be a useful algorithm--this reminds me a little of the Design Structure Matrix though that only shows one-way dependencies rather than two-way relationships.
    • When a colored blob consists of sets that are completely disjoint from all other sets (like the set containing X in your example), it can be moved to a separate graph.
    0 讨论(0)
  • 2021-02-02 12:56

    I do not have your solution for getting the data in the proper format. Take a look at this javascript plugin created by MIT for building graphs, sigmajs. Haven't looked at the data it accepts, but may be worth a look.

    0 讨论(0)
  • 2021-02-02 13:02

    Yes, this is a fairly well-studied problem. What you are describing is called a hypergraph. Each element can be represented as a vertex in a graph, and the sets are the hyperedges. The problem then becomes that of visualizing hypergraphs.

    enter image description here

    Unfortunately there isn't a perfect, generalized solution to this since even the simplest graphs can have complex visualizations.

    If your sets are relatively small (< 5 elements), you can use a regular graph drawing library like graphviz. To do this, simply connect all pairs of vertices within each set and color them differently. This will yield a solution similar to this:

    enter image description here

    0 讨论(0)
  • 2021-02-02 13:05

    There are many approaches to this problem but personally, I'd draw sort of a Venn chart using dynamically generated SVG with a tool like Raphael JS and color it the way I want. Also, Raphael has api like Set that can enable you to give full detailed information about the elements and their relations. There SVG to Code converter will also likely help out in understanding how you can generate the SVG elements.

    Alternatively you could, use tools like Venn charts:

    Venn chart sample

    which seems to be easily adaptable to this scenario. There's also Flotr2 which can create bubble charts:

    Bubble chart flotr

    or even Canvas Express.

    Canvas Xpress Diagrams

    A little more tweaking with any of the later tools will enable you to get it properly done...

    0 讨论(0)
提交回复
热议问题