Patterns and Sequences: Interactive Exploration of Clickstream Data

with Yang Wang, Mira Dontcheva, Matthew Hoffman, Seth Walker, Alan Wilson

Modern web clickstream data consists of long, high-dimensional sequences of multivariate events, making it difficult to analyze. Following the overarching principle that the visual interface should provide information about the dataset at multiple levels of granularity and allow users to easily navigate across these levels, we identify four levels of granularity in clickstream analysis: patterns, segments, sequences and events. We present an analytic pipeline consisting of three stages: pattern mining, pattern pruning and coordinated exploration between patterns and sequences. Based on this approach, we discuss properties of maximal sequential patterns, propose methods to reduce the number of patterns and describe design considerations for visualizing the extracted sequential patterns and the corresponding raw sequences. We demonstrate the viability of our approach through an analysis scenario and discuss the strengths and limitations of the methods based on user feedback.

Papers

Patterns and Sequences: Interactive Exploration of Clickstreams to Understand Common Visitor Paths
PDF (TVCG 2016)

Mining, Pruning and Visualizing Frequent Patterns for Temporal Event Sequence Analysis
PDF (Event Workshop 2016)

Supplemental Materials

Algorithms for sorting sequences and searching events
Prototypes and design ideas from early iterations

Related Project

CoreFlow
Real Time Web Analytics