作者
北京大学
- Yucheng Huang
- Tong Yang
中国科学院软件研究所计算机科学实验室
- Lei Shi
- Yue Su
- Deyun Wang
雅虎实验室
- Yifan Hu
亚利桑那州立大学
- Hanghang Tong
圣母大学
- Chaoli Wang
清华大学美术学院
- Shuo Liang
摘要
影响力演化图的可视化对于执行许多现实生活任务(例如引文分析和社会影响分析)很重要。主要挑战包括如何总结大规模、复杂和随时间变化的影响图,以及如何设计有效的视觉隐喻和动态表示方法来说明随时间推移的影响模式。在这项工作中,我们介绍了Eiffel,这是一个集成的可视化分析系统,该系统对节点,关系和时间纬度的影响力演化图进行了三次汇总。在数值实验中,就基于影响流的目标而言,Eiffel汇总结果优于传统聚类算法。此外,提出了一种流程图表示法,并适用于影响图摘要的情况,它支持两种演化可视化模式(即翻书和电影),以加快对影响图动力学的分析。我们进行了两个受控用户实验,以分别评估影响图摘要和可视化方面的技术。我们还在两种典型情况的影响力演化分析中展示了该系统,这两种情况是科学论文的引文影响和新兴在线事件的社会影响。评估结果证明了Eiffel在影响演化图的可视分析中的价值。
Introduction
影响力
- Cyber-Influence (社交)
- Social Influence(意见领袖)
- Physical-Influence (蝴蝶效应)
- Diffuse ideas like epidemic
- Induce friends to behave similarly
- Develop community through paper co-authorship, citation, co-citations, shared topics, etc.
学术论文引文的影响力图
Entity: papers(authors)
Relationship: citations
Usage: understand the development of topics from a landmark paper
Challenge: size and complexity of the influence graph for understanding
消息转发的社会影响力图
Entity: tweets(users)
Relationship: re-tweets(comments)
Usage:
- Understand the nature of propagation
- Amplify/contain the impact for social compaigns
Challenge: visualize the dynamics of influence graph
Related Work
Information Propagation Visualization
- Whisper
- G+ Ripples
Research Problem
节点:影响源、传播者
影响边:反向的引用或转发关系
最大影响图:influencer + propagators + influence links
静态影响图
用户需求:
- Compact visual Summary ( ≈ \approx ≈ clustering)
- Focus on influence flows ( ≠ \neq = community detection)
- Overview task
影响演化图
边加上时间戳
用户需求:understand temporal dynamics of the influence graph
问题:
- Online vs. Offline summarization
- Smooth and staged visualization
Graph Summarization
Graph Summarization Framework
Triple Summarization Pipeline
- Node(objective function)
- edge(visual clutter)
- time(staged visuals)
Node Summarization
SymNMF algorithm (node clustering)
similarity matrix:
A G = A A T + A T A 2 A^{G}=\frac{A A^{T}+A^{T} A}{2} AG=2AAT+ATA
SymNMF:
min H ≥ 0 ∥ A G − H H T ∥ F 2 \min _{H \geq 0}\left\|A^{G}-H H^{T}\right\|_{F}^{2} minH≥0∥∥AG−HHT∥∥F2
基本原理:
- Similarity defined by # of commonly cited/citing papers(co-citation)
- Proved for maximizing the objective function
Comparison with graph clustering, k-means
Findings:
- Graph < 1000 nodes, SymNMF has the best trade-off
- Large graphs, all methods fail on content consistency
Edge Summarization
Algorithm
- Connected Top-n Graph
- Maximum Weighted Spanning Tree (MWST)
- Maximal Padded MWST
Findings:
- Maximal padded MWST preserves a higher percent of overall flow rate
Temporal Summarization
Objective & algorithm
- Divide the timeline into L L L time frames for large influence links
r s e g ( ξ ( g ) ) = ∑ e i j ∈ ξ , t i j ∈ W g a i j ∣ π S ( ξ ) ∣ ∣ π D ( ξ ) ∣ ⋅ ∑ e i j ∈ ξ , t i j ∈ W g a i j ∣ W g ∣ ⋅ ∣ W g ∣ q r_{s e g}\left(\xi^{(g)}\right)=\frac{\sum_{e_{i j} \in \xi, t_{i j} \in W_{g}} a_{i j}}{\sqrt{\left|\pi_{S(\xi)}\right|\left|\pi_{D(\xi)}\right|}} \cdot \frac{\sum_{e_{i j} \in \xi, t_{i j} \in W_{g}} a_{i j}}{\left|W_{g}\right|} \cdot\left|W_{g}\right|^{q} rseg(ξ(g))=∣πS(ξ)∣∣πD(ξ)∣ ∑eij∈ξ,tij∈Wgaij⋅∣Wg∣∑eij∈ξ,tij∈Wgaij⋅∣Wg∣q (flow segmentation rate)
max ∑ i = 1 l ∑ g = 1 L r s e g ( ξ i ( g ) ) \max \sum_{i=1}^{l} \sum_{g=1}^{L} r_{s e g}\left(\xi_{i}^{(g)}\right) max∑i=1l∑g=1Lrseg(ξi(g)) (sum of flow segmentation rate) - An iterative optimization
Evolutionary Flow Map
Visual Design
Interface
Eiffel Flow Map
Flow map visualization (bundled edges and non-tree edges)
Layout algorithm
Over the flow map layout
- Non-tree graphs vs. Star tree
- Multiple branches vs. Two
Evolutionary Visualization: Flip-book Mode
Evolutionary Visualization: Movie Mode
Evaluation
Case Study
Citation Influence Analysis
Data(#paper/#citation)
- AMiner V8: (2.4M/10.5M)
- CiteseerX: (26.1M/63.0M)
- 18010 papers@37 venues
Topic Evolution
- Influencer: Jigsaw@VAST’07
- Three stages
Social Influence Analysis
Experiments
Design: user understanding of
- Eiffel summarization
- Eiffel visualization
Result on obj./sub. metrics - Eiffel > Google Scholar like UI in summarization of medium-sized graph (~1000 nodes)
Discussion & Future work
Limitations
- Fixed number of clusters in the summarization
- Exhaustive search for the maximal influence graph
- Single-source influence graph
- Granularity of evolutions
Future work
- Hierarchical summarization
- Topic-based influence graph through semantic analysis of citations
- Other applications: code influence in software, etc.
The data quality issue(#citation / #paper)
- CiteseerX: 63.0M / 26.1M = 2.4
- AMiner V8: 10.5M / 2.4M = 4.4
- Google Scholar (GS): ?? (>4.4)
Future work
- Directly work on Google Scholar data for summarization
- Challenge: large-scale crawling of GS data is prohibited
Summary
Contribution
- Problem: visualization of large-scale, time-varying influence graphs
- Triple summarization of dynamic influence graphs: Node-Edge-Time
- Visualization by evolutionary flow map
- Improved flow map layout
- Flip-book & movie dynamic visualization
- Evaluation and implementation
- Case studies
- User experiments
来源:oschina
链接:https://my.oschina.net/u/4335112/blog/4464135