imMens is designed to work on modern graphics cards. Machines with an integrated graphics card, such as MacBook Air, may not be able to provide a satisfactory interactive experience. We recommend using machines with a dedicated graphics card (e.g. MacBook Pro) and a browser that supports WebGL (e.g. Google Chrome).
– 4.5 million user checkins on Brightkite
– 35.6 million flight delays in the U.S. from 1989 to 2008
– 10K to 1B synthetic data points visualized as scatterplot matrices (SPLOM)
Source code available on Github
Data analysts must make sense of increasingly large data sets, sometimes with billions or more records. We present methods for interactive visualization of big data, following the principle that perceptual and interactive scalability should be limited by the chosen resolution of the visualized data, not the number of records. We first describe a design space of scalable visual summaries that use data reduction methods (such as binned aggregation or sampling) to visualize a variety of data types. We then contribute methods for interactive querying (e.g., brushing & linking) among binned plots through a combination of multivariate data tiles and parallel query processing. We implement our techniques in imMens, a browser-based visual analysis system that uses WebGL for data processing and rendering on the GPU. In benchmarks imMens sustains 50 frames-per-second brushing & linking among dozens of visualizations, with invariant performance on data sizes ranging from thousands to billions of records.
Using Google Fusion Tables (left) and imMens (right) to visualize a dataset of 4M Brightkite user checkins. Fusion Table’s symbol map visualizes a sample of the data, while imMens’ heatmap shows the density of checkins by aggregation. Compared to the heatmap, sampling misses important structures such as inter-state highway travel and Hurricane Ike, while dense regions still suffer from over-plotting. Moreover, imMens supports real-time brushing and linking among various dimensions of the dataset.
Multiple coordinated views of Brightkite user checkins in North America. Cyan lines in the heatmap indicate data tile boundaries. Each visualization region is annotated by its backing data dimensions and indices.