Eiffel: Evolutionary Flow Map for Influence Graph Visualization

论文传送门
 视频

作者

北京大学

Yucheng Huang
Tong Yang

中国科学院软件研究所计算机科学实验室

Lei Shi
Yue Su
Deyun Wang

雅虎实验室

Yifan Hu

亚利桑那州立大学

Hanghang Tong

圣母大学

Chaoli Wang

清华大学美术学院

Shuo Liang

摘要

影响力演化图的可视化对于执行许多现实生活任务（例如引文分析和社会影响分析）很重要。主要挑战包括如何总结大规模、复杂和随时间变化的影响图，以及如何设计有效的视觉隐喻和动态表示方法来说明随时间推移的影响模式。在这项工作中，我们介绍了Eiffel，这是一个集成的可视化分析系统，该系统对节点，关系和时间纬度的影响力演化图进行了三次汇总。在数值实验中，就基于影响流的目标而言，Eiffel汇总结果优于传统聚类算法。此外，提出了一种流程图表示法，并适用于影响图摘要的情况，它支持两种演化可视化模式（即翻书和电影），以加快对影响图动力学的分析。我们进行了两个受控用户实验，以分别评估影响图摘要和可视化方面的技术。我们还在两种典型情况的影响力演化分析中展示了该系统，这两种情况是科学论文的引文影响和新兴在线事件的社会影响。评估结果证明了Eiffel在影响演化图的可视分析中的价值。

Introduction

影响力

Cyber-Influence （社交）
Social Influence（意见领袖）
Physical-Influence （蝴蝶效应）

Diffuse ideas like epidemic
Induce friends to behave similarly
Develop community through paper co-authorship, citation, co-citations, shared topics, etc.

学术论文引文的影响力图
Entity: papers(authors)
Relationship: citations
Usage: understand the development of topics from a landmark paper
Challenge: size and complexity of the influence graph for understanding

消息转发的社会影响力图
Entity: tweets(users)
Relationship: re-tweets(comments)
Usage:

Understand the nature of propagation
Amplify/contain the impact for social compaigns
Challenge: visualize the dynamics of influence graph

Related Work

Information Propagation Visualization

Whisper
G+ Ripples

Research Problem

节点：影响源、传播者
影响边：反向的引用或转发关系
最大影响图：influencer + propagators + influence links

静态影响图
用户需求：

Compact visual Summary ( ≈ \approx ≈ clustering)
Focus on influence flows ( ≠ \neq = community detection)
Overview task

影响演化图
边加上时间戳
用户需求：understand temporal dynamics of the influence graph
问题：

Online vs. Offline summarization
Smooth and staged visualization

Graph Summarization

Graph Summarization Framework
Triple Summarization Pipeline

Node(objective function)
edge(visual clutter)
time(staged visuals)

Node Summarization

SymNMF algorithm (node clustering)

similarity matrix:
A G = A A T + A T A 2 A^{G}=\frac{A A^{T}+A^{T} A}{2} AG=2AAT+ATA

SymNMF:
min ⁡ H ≥ 0 ∥ A G − H H T ∥ F 2 \min _{H \geq 0}\left\|A^{G}-H H^{T}\right\|_{F}^{2} minH≥0∥∥AG−HHT∥∥F2

基本原理：

Similarity defined by # of commonly cited/citing papers(co-citation)
Proved for maximizing the objective function

Comparison with graph clustering, k-means
Findings:

Graph < 1000 nodes, SymNMF has the best trade-off
Large graphs, all methods fail on content consistency

Edge Summarization

Algorithm

Connected Top-n Graph
Maximum Weighted Spanning Tree (MWST)
Maximal Padded MWST

Findings:

Maximal padded MWST preserves a higher percent of overall flow rate

Temporal Summarization

Objective & algorithm

Divide the timeline into L L L time frames for large influence links
r s e g ( ξ ( g ) ) = ∑ e i j ∈ ξ , t i j ∈ W g a i j ∣ π S ( ξ ) ∣ ∣ π D ( ξ ) ∣ ⋅ ∑ e i j ∈ ξ , t i j ∈ W g a i j ∣ W g ∣ ⋅ ∣ W g ∣ q r_{s e g}\left(\xi^{(g)}\right)=\frac{\sum_{e_{i j} \in \xi, t_{i j} \in W_{g}} a_{i j}}{\sqrt{\left|\pi_{S(\xi)}\right|\left|\pi_{D(\xi)}\right|}} \cdot \frac{\sum_{e_{i j} \in \xi, t_{i j} \in W_{g}} a_{i j}}{\left|W_{g}\right|} \cdot\left|W_{g}\right|^{q} rseg(ξ(g))=∣πS(ξ)∣∣πD(ξ)∣ ∑eij∈ξ,tij∈Wgaij⋅∣Wg∣∑eij∈ξ,tij∈Wgaij⋅∣Wg∣q (flow segmentation rate)
max ⁡ ∑ i = 1 l ∑ g = 1 L r s e g ( ξ i ( g ) ) \max \sum_{i=1}^{l} \sum_{g=1}^{L} r_{s e g}\left(\xi_{i}^{(g)}\right) max∑i=1l∑g=1Lrseg(ξi(g)) (sum of flow segmentation rate)
An iterative optimization

Evolutionary Flow Map

Visual Design

Interface

Eiffel Flow Map
Flow map visualization (bundled edges and non-tree edges)

Layout algorithm

Over the flow map layout

Non-tree graphs vs. Star tree
Multiple branches vs. Two

Evolutionary Visualization: Flip-book Mode

Evolutionary Visualization: Movie Mode

Evaluation

Case Study

Citation Influence Analysis
Data(#paper/#citation)

AMiner V8: (2.4M/10.5M)
CiteseerX: (26.1M/63.0M)
18010 papers@37 venues

Topic Evolution

Influencer: Jigsaw@VAST’07
Three stages

Social Influence Analysis

Experiments

Design: user understanding of

Eiffel summarization
Eiffel visualization
Result on obj./sub. metrics
Eiffel > Google Scholar like UI in summarization of medium-sized graph (~1000 nodes)

Discussion & Future work

Limitations

Fixed number of clusters in the summarization
Exhaustive search for the maximal influence graph
Single-source influence graph
Granularity of evolutions

Future work

Hierarchical summarization
Topic-based influence graph through semantic analysis of citations
Other applications: code influence in software, etc.

The data quality issue(#citation / #paper)

CiteseerX: 63.0M / 26.1M = 2.4
AMiner V8: 10.5M / 2.4M = 4.4
Google Scholar (GS): ?? (>4.4)

Future work

Directly work on Google Scholar data for summarization
Challenge: large-scale crawling of GS data is prohibited

Summary

Contribution

Problem: visualization of large-scale, time-varying influence graphs
Triple summarization of dynamic influence graphs: Node-Edge-Time
Visualization by evolutionary flow map
- Improved flow map layout
- Flip-book & movie dynamic visualization
Evaluation and implementation
- Case studies
- User experiments

来源：oschina

链接：https://my.oschina.net/u/4335112/blog/4464135

标签

eiffel

Temporal

entity