符号数据通常是从大型数据集聚合而来,用于隐藏条目特定的细节,并将大量数据(如大数据)转换成可分析量。在总体趋势比个别细节更重要的地方它可用来提供总览。符号数据有多种形式,如区间、直方图、类别和模态多值对象。符号数据也可以认为是一种分布。目前,实际使用的符号数据可视化方法是zoomstars,它有许多局限性。最大的限制是因为需要另一维度的数据,默认分布(直方图)在2D内不受支持。
本文研究符号数据的可视化,并分析其复杂结构带来的挑战,同时提出了对zoomstars的几种改进,使其能够通过分位数或等价的区间方法实现2D内直方图的可视化。此外,还提出了对分类变量和模态变量的几项改进,使之能更清楚地展现所呈现的类别。
根据数据类型和期望的目标,本文为用户提供了基于zoomstars的不同可视化方案。此外,提出了一种形状编码的方法,可在综合的类似表格的图中可视化整个数据集。这些可视化方法及其可用性通过三个符号数据集进行了验证,这三个数据集在探索性数据挖掘阶段分别用来识别趋势、相似对象和重要特征,检测数据中的异常值和差异。
关键词:
数据可视化, 符号数据, Zoomstar, 形状编码,探索性数据分析
全文信息
Improving symbolic data visualization for pattern recognition and knowledge discovery
BY: Kadri Umbleja, Manabu Ichino, Hiroyuki Yaguchi
Abstract:
This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure. Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data (like big data) into analyzable quantities. It is also used to offer an overview in places where general trends are more important than individual details. Symbolic data comes in many forms like intervals, histograms, categories and modal multi-valued objects. Symbolic data can also be considered as a distribution. Currently, the de facto visualization approach for symbolic data is zoomstars which has many limitations. The biggest limitation is that the default distributions (histograms) are not supported in 2D as additional dimension is required. This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach. In addition, several improvements for categorical and modal variables are proposed for a clearer indication of presented categories. Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal. Furthermore, an alternative approach that allows visualizing the whole data set in comprehensive table-like graph, called shape encoding, is proposed. These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends, similar objects and important features, detecting outliers and discrepancies in the data.
Keywords: Data visualization, Symbolic data, Zoomstar, Shape encoding, Exploratory data analysis
Link: https://www.sciencedirect.com/science/article/pii/S2468502X19300014
投稿信息:
Elsevier link (including First Online Articles): https://www.journals.elsevier.com/visual-informatics
Submit your paper:
https://www.editorialmanager.com/VISINF/default.aspx
Tel:(86-571)88206681-519
E-mail: lujinzhi@cad.zju.edu.cn
Linked in:Visual Informatics
Official account on Wechat: visinf
来源:CSDN
作者:VISINF
链接:https://blog.csdn.net/VISINF/article/details/103988180