Zhicheng Liu

Research on Human-Data Interaction


Color perception and language

Recently some colleagues shared an interesting documentary video (embedded above, from 3:00 onwards) about how the Himba perceive colors, and how language plays an important role in shaping color perception. It seems intuitive to us that visual perception, especially pre-attentive processing, is hard-wired in the brain. That is, endowed with the same neurological machinery, every one of us perceives colors in the same way. In short, your red is my red, and what I see is what you see.

This view is sometimes characterized as universalism. An opposing view called relativism also exists, and it claims that higher-level conceptual system shapes our perception. An extreme form of relativism argues that since color categories can vary across languages without constraints, human perception of color can thus differ culturally in arbitrary ways. Relativism can be traced back to the Sapir-Whorf hypothesis.

Which view is correct? The empirical evidence included in the video seems to suggest that relativism wins. The universalist stance, however, is not to be dismissed so easily. It is reasonable to assume that natural language evolves later than the ability to perceive color (although it may be argued that such ability continues to develop with the development of language). Why does a language have the specific color categories? Where do these categories come from?  In Basic Color Terms by Berlin and Kay (1969), they identified 11 basic color terms that are universal. In English, these terms correspond to the color categories of black, white, red, yellow, green, blue, brown, purple, pink, orange and gray. Of course, not all cultures have color categories for all the 11 basic terms. Some cultures have only two or three color terms. Nevertheless,there seems to be a strict sequential order in which basic color terms are added as languages evolve. When a language has only two basic color terms, they are black and white. When a language has three basic color terms, the third one will be red. Berlin and Kay eventually proposed the following sequence:

black, white
yellow, blue, green
purple, pink, orange, gray

It seems then that somehow there is a universal factor (most probably neurological) that determines the evolution of linguistic categories for color. Berlin and Kay’s work has been recently challenged and the whole universalism/relativism debate continues. Much new evidence has arisen, which can be interpreted to support either universalism (e.g. a series of studies by Eleanor Rosch)  or relativism (e.g. the work by Debi Roberson). No final conclusion has been made yet. The real relationship between color perception and language is probably not just a simple matter of one influencing another. It is possible that both universalism and relativism are at the same time partly correct and partly wrong. As D’Andrade (1995) has put nicely:

“The argument here is not that of extreme cultural constructionism, which argues that everything is determined by culture, nor that of psychological reductionism, which argues that all culture is determined by the nature of the human psyche. The position taken here is that of interactionism, which hypothesizes that culture and psychology mutually affect each other. The problem, as Richard Shweder (1993:500) says, is to determine ‘how culture and psyche make each other up.’ “

Syntactic Structure, linguistic theory and visualization theory

I always believed that the study of visualizations and the study of (natural) languages could mutually inform each other, so through examining the development of linguistic theory, we can learn something useful for pushing forward theoretical research on visualization.

Noam Chomsky’s Syntactic Structure is undoubtedly a classic piece of linguistic theory research. The philosophy it embodies was the Zeitgeist of its time: the 1960s was the golden age of the “cognitive revolution”, and a central belief was that cognition and intelligence were to be explained as the manipulation and transformation of symbols. Chomsky’s approach now
is characterized as “generative grammar” in differentiation with the previous dominant approach called “structuralism“. Before structuralism, as the Wikipedia tells us, the linguistic theory was all about how languages originate and change historically.

The development of linguistic theory in terms of paradigm shifts, to me, is not a continuous effort to refine existing theories. Instead it is about the changing notion of what theoretical questions are important, or simply, theories of what? Historical linguistics is undoubtedly concerned with the theoretical question of how do natural languages evolve? Structuralists such as Saussure seemed less concerned with these questions. He believed that language may be analyzed as a formal system of differential elements such as phonemes, morphemes, lexical categories, noun phrases, verb phrases, and sentence types.
Chomsky’s approach builds upon structuralism but he is more interested in a theory that explains the generative mechanisms of language. Structuralist approach, to him, is inadequate because the theory cannot produce the full range of diverse sentences in English. Chomsky’s theory, however, focuses exclusively on syntax and dismisses the problem of meaning. As a response, the emerging cognitive semantics becomes more interested in theoretical questions about semantics: what is meaning? how does meaning arise?

Visualization theory research is much less advanced than linguistic theory research, but I seem to notice some similar patterns in the development of visualization theory. The pinnacle of visualization theory research so far is represented in Bertin’s Semiology of Graphics, Mackinlay’s APT framework and Wilkinson’s Grammar of Graphics. Bertin’s work, to me, is inherently structuralist. The APT framework builds upon Bertin’s structuralist work and incorporates generative mechanisms. These two apporaches, however, do take into the considerations of the meaning and expressiveness of visualizations. Historical analysis and cognitive analysis in visualization theory have been lacking so far (and our lab’s work on Distributed Cognition and mental models may be characterized as cognitive).

I do not believe, however, that more recent theoretical paradigms will always be better than outdated paradigms – there is simply no such thing as “absolute good or better”. These paradigms are concerned with different theoretical questions, which may be deemed to be important or not important in different times. From a utilitarian perspective, the structuralist and generative paradigms in linguistics do seem to be the most useful approaches: they can be directly applied to build intelligent machines that can automatically produce syntactically correct sentences. Similarly, we have seen how the APT framework and the Grammar of Graphics lead to sophisticated visual analytics systems like Tableau.

Activity Theory: Levels of Activity

A core concept of activity theory is the irreducible triad called mediation. To understand mediation, it makes sense to focus on each consitutent part (human, visualization, goal) without losing track of a global view of the irreducible dynamics between them.

To me, the concept of levels of activity is inherently about understanding human motivation, aka the “object” component in the Vygotskyian triangle:

The original three levels of activity are activity, action and operation. Let’s say we want to accomplish an activity of visually analyzing a dataset. The motive behind such an activity is to gain understanding of the dataset. This activity has to be carried out in a number of steps or phases, or in a chain of actions including loading the dataset, picking a visual overview, filtering, highlighting etc. Each of these actions can be understood as driven by a more specific goal. For example, to load the dataset is to construct a visual model, and to filter is to show something conditionally.  For an action to be carried out in the real world, we must take into account of the resources available in the environment and their affordances and constraints. For example, to find a visual item representing a data case, we can scan across the screen if the target visual item is labeled, or coordinate eye scanning with mouse-over if the label is only available in the form of a tooltip, or type a search query and a button click if searching is supported. These physical realizations of actions are called operations, and they must be understood by the conditions given at the moment of acting.

These three levels are a rough sketch of the multiple tiers in human motivation and activities. Applying these concepts to the problem of insight provenance in visual analytics, Gotz and Zhou derive a four-level structure: Tasks, Sub-tasks, Actions, and Events. The notion of different levels of activity with different semantic richness is especially useful when we want to relate user’s intention with low-level operations such as mouse clicks.

I have also tried to apply the notion of levels of activity in some of my research and writing. The InfoVis’10 paper tried to incorporate some activity theory elements / Vygotskian psychology with distributed cognition. Also I believe theoretical frameworks such as activity theory are most useful not for giving design guidelines, but for asking interesting questions. For example, if we focus our attention at the level of activity instead of that of action or operation, we may think about different types of visualization-oriented activity. In addition, activity theory argues that actions and activities are usually consciously planned, while operations are performed subconsciously and without deliberation. Whether it is conscious or subconscious, we must have adequate knowledge (in Vygotskian terms, it’s internalization of activites that originally happen externally) in order to perform competently. If we consider knowledge as internal representations, what is the implication on understanding various kinds of internal representations?

Activity Theory: Mediation

Activity theory, like distributed cognition, is another theoretical framework that is familiar to many of us, and has been used in HCI research. The more I read about activity theory, the more I feel it is addressing very similar issues as distributed cognition, but from a different perspective with different emphases. Distributed cognition has its roots in anthropology (Hutchins is an anthropologist, and was a student of Roy D’andrade). Activity theory can be traced back to Lev Vygotsky, who is the father of cultural-historical psychology.

There are several key concepts in activity theory. In this post I focus on mediation and its implications on understanding visualization as a cognitive and cultural artifact.

No one will deny that visualization is essentially a tool for accomplishing something: the purpose of visualization is insights, not pretty pictures. It thus makes sense to claim that the utility and role of visualization cannot be discussed abstractly independent of its context, but can only be evaluated and understood with respect to human user(s) and a specific purpose.

This observation is inline with the basic tenet of activity theory, which argues that there is an irreducible tension between cultural tools (visualization), and agents’ active uses of them. Sounds reasonable and perhaps a little too obvious? In actual research practices however we are often more than happy to choose to focus either on visualizations alone or on human agents only. The current disciplinary setup (InfoVis vs. Cog Sci) makes such approaches natural, but adherents of activity theory will argue that these efforts are essentially misguided. Focusing on visualization alone implies that human action is causally determined by the environment, while focusing on human alone risks attributing too much cognitive abilities to humans in isolation.

The unit of analysis is thus not visualization, or human, but an irreducible whole (sounds similar to DCog?) called mediation. The Vygotsky triangle (figure below) depicts such a unit of analysis:

And here’s my adaptation for InfoVis

With this tri-dimensional, non-linear theoretical construct, how do we go about systematically analyzing it? I believe this is an important problem that future InfoVis/Visual Analytics research needs to tackle. One way to proceed is to think metaphorically in terms of conventional data analysis techniques: If we have this multifaceted and multidimensional dataset called mediation, we’ll need some tools to look at it from different angles, at different levels of abstraction, and perhaps slice and dice it! Coming up next, I’ll discuss the concept of three levels of activity, which provides a different angle to look at mediation. Eventually, we may want to introduce and synthesize conceptual and methodological findings from a broader range of disciplines to fully handle this complex beast called mediation.

The Paradox of “Human-Centeredness”

Having been in a PhD program called “Human-Centered Computing” and working on visualization for more than 4 years now, I find myself keep stumbling across the question of what is human-centered design. Is it about understanding the users? Is it about designing simple, intuitive interfaces? Is it about empowering and supporting users’ needs?

All these are reasonable, yet potentially incompatible goals. Take Tableau for example, it is a direct spin-off of a CS PhD thesis at Stanford. As far as I know, no contextual inquiry or ethnographic studies were conducted before the system design, and no “rigorous” evaluation was conducted after the system implementation. The system is built based on an algebraic formalism that few can understand. The interface, to any first-time user, isn’t really simple or intuitive. For example, what is a measure and what is a dimension? These are technical concepts from OLAP databases known to few. Also it can take some time to get a grip on the system’s incremental, dynamic way of updating visualizations. Yet the system is one of the very few InfoVis/Visual Analytics products that have been doing well commercially.

There are tons of other visualization systems that might be judged to be more user-friendly and intuitive – just pick up a copy of recent year’s VisWeek proceedings. I don’t believe that Tableau’s success is purely due to factors that have nothing to do with the system itself (e.g. marketing). Rather, Tableau seems to be more flexible and powerful than many other systems, which are often limited to a particular problem domain or task. When you can empower users to do things they want to, they are motivated to learn, and they learn astonishingly well. Of course good design is still essential here to make a complex, advanced technology easy to learn and use, but good design is not equivalent to simplistic interface.

The fact that humans can change (dramatically) if they are motivated might contribute to the tension between engineering and science in design/HCI. In a recent interesting blog post, it is argued that engineering is “about solving problems by rearranging the stuff of the world to make new things“, and science is “about understanding the origins, nature, and behavior of the universe and all it contains“. Understanding how users work with command line interfaces can result in incremental refinement of the existing interface, but may not bring about the paradigm of graphical user interface. Every exciting and ground-breaking technology seems to challenge people’s existing way of thinking and doing things, and bring about profound social and cultural changes. (From a cultural model perspective, this means the creation of models-in-the world that are later internalized as models-in-the mind.)

What is the role of science or theory in HCI/Visualization then? Many seem to believe that the science should inform and serve as the foundation of engineering, and it is true that without theories such as Shannon’s information theory, Turing machine or  relational algebra, the digital age would never come. If an HCI/visualization science is about how users perceive, understand and interact with computer systems, it will have its value. I doubt however that it will ever predict the next big thing. Science can only build on data from the current world, but humans can always change and the future is ever more exciting.

Replies to a fictitious skeptic

Q: All these terms, mental models, schemas, coordination, etc. they are pretty fuzzy and not definite. I believe that to understand how InfoVis works we must understand how the brain works.

A: First, understanding a phenomenon often involves multiple levels of description and explanation (e.g. the famous AI researcher David Marr proposed three levels of analysis for information processing problems). I do not mean to completely dismiss the relevance of neuroscience for InfoVis (I seemed to do so in a previous post). Rather I just believe that a conceptual level description about the nature of representation and interaction in InfoVis can provide a more direct and relevant account. Ultimately internal representations have to be grounded at the neuro-physiological level, but neurological level explanations need to be constrained by conceptual and behavioral evidence too.

Second, what is meant by “how the brain works”? The undertone of this phrase is that the brain can be understood as a machine in isolation from the body and from the environment. This view is being increasingly questioned and criticized. Of course one can argue this is just a matter of opinion and assumption. I believe in a holistic approach of understanding InfoVis processes enabled by brains coupled with bodies embedded in a socio-cultural environment mediated by language, visual images and cultural practices.

Q: Theories of mental models, distributed cognition etc. have been around for a while, I saw no direct application or immediately usable results. Why do we care?

A: It might just be that these concepts or theories are not fully mature yet, or maybe these concepts can only be useful when combined with domain-specific investigations. We the InfoVis researchers can appreciate an incremental result in an interaction technique or a layout algorithm, but we do not have patience for the potentially slow development of theories. I believe we should play an active role in further developing existing theories that sound promising and relevant to InfoVis. We seem to be comfortable enough in considering InfoVis as an applied area and leaving theoretical questions to psychologists and cognitive science, but as Newell and Card nicely put, “nothing drives basic science better than a good applied problem”.

Q: What about perception? Isn’t it the most important aspect of InfoVis?

A: Perception is important, but perception is not a process where we take a snapshot of the visualization and then decode every bit of information inside. Empirical studies have shown how astonishingly little information we actually attend to and are aware of (e.g. the hidden gorilla). Perception, to me, is the active sampling and foraging of information, and hence must be understood in the context of dynamic, continuous loop of interaction.

There are of course qualitative aspects of visual sensation to be understood. Understanding how we perceive color difference or size difference, for example, is helpful, but when users are not certain if two colors are really that different, they often can just mouse over to get the exact values represented by the colors and do a comparison. Even in InfoVis, the foraging of information often is not purely achieved through non-textual visual means. Hence perception must also be understood in conjunction with action so that we know how they complement each other.

[Have better questions? Don’t agree with the replies? Let me know!]

Model vs. Schema

I got an email inquiry asking why I chose to look at mental models instead of schemata in this year’s InfoVis paper. What are schemata? Like the term “mental model”, the meaning of “schema” as used to refer to people’s internal mental representation has also been ambiguous. To some people, mental model is just a different name of schema. To others, schemata are a kind of unitary and bounded representation of an object or event (e.g. a “dog” schema, or a “shopping” schema), and models are compositions of schemata. The Development of Cognitive Anthropology (D’Andrade 1995) makes the following distinction:

“Every schema serves as a simple model in the sense that it is a representation of some object or event. For example, seeing a grocery store clerk hand a bag of apples to a shopper and accept money, the commercial transaction schema, …, would serve as a probable model for what has been seen. However, many models are not schemas themselves, although they are composed of schemas. Models are not schemas when the collection of elements is too large and complex to hold in short-term memory.” (p.151-152)

According to the definitions here, models seem to be inherently distributed. When they cannot be held in working memory, they are bound to be “spilled over” to include the environment. My paper’s focus on mental models is largely because a distributed cognition perspective has been integrated into existing work on mental models and cultural models. Mental model is thus an immediately usable concept to understand interaction in InfoVis.

The question of mental model vs. schema however is very interesting and important, and likely will not be answered fully by the distinction above. MacEachren’s How Maps Work (1995) talks about three types of knowledge schemata: propositional schemata, image schemata and event schemata (scripts and plans) in Chapter 4. The chapter also includes a nice discussion on how the relevant schemata are acquired developmentally, are selected for processing visual input, and are used in interpreting maps. His definition of schema seems to align well with the distinction made by D’Andrade.

In Culture in Mind (1996), Shore outlines a different interpretation. He noted that the terms “model” and “schema” are often used interchangeably to refer to organizations at different levels of abstraction, and makes a distinction between abstract global models and more concrete and particular instantiations of those models. So schemas, to him, are more general and abstract, and the examples are the “image schema” as used in MacEachren and Lakoff’s work. Models, per his definition, are more concrete, specific, and can exist both as public artifacts “in the world” and as cognitive constructs “in the mind”. And he spent 10+ pages talking about different kinds of models from a structural as well a functional perspective. This distinction is only part of his grander theory of cultural models, which is both fascinating and hard to understand.

Coming up next: Replies to a fictitious skeptic

Cognitive approaches for InfoVis

How can cognitive science inform InfoVis and what cognitive research work is relevant for InfoVis? Cognition is such a complex and multifaceted phenomenon that there are diverse approaches of studying it. Perhaps not every approach is relevant for InfoVis. Some, for example, look at the neurological basis of cognition, mapping different brain areas to different cognitive functions. Knowing that the visual cortex is in charge of vision or the dorsal/ventral streams specialize in different functions may be useful for understanding and treating abnormal behaviors due to brain lesions, but it doesn’t really tell us much about how and why InfoVis works.

In the research agenda outlined in Illuminating the path, two strands of cognitive research are identified for having the potential of informing InfoVis/Visual Analytics. One is “laboratory-based psychology studies that establish the basic bottlenecks in human abilities to perceive, attend and process information“. I interpret this description to refer to work such as the “magic number 7+-2” for human working memory capacity, pre-attentive perception, confirmation biases and limitation in attention. Many of these research findings are well-known to InfoVis researchers and are widely used. The second strand is “higher-order, embodied, enactive, and distributed models such as those proposed by Gibson and Varela et al.” The relevant work here is seldom talked about or even known by InfoVis researchers.

In Illuminating the path, there is really not much elaboration on the second strand. According to what I know, here’s a list of relevant work that may fit well into the category (Feel free to suggest more, and I didn’t read all of them, of course.):

  • distributed cognition
    important figures: Edwin Hutchins, David Kirsh and maybe Andy Clark, who proposed the extended mind hypothesis (difference between DCog and extended mind, anyone?)
    important books/papers: Cognition in the Wild, Supersizing the mind, Being there

With all the buzz words such as “situated”, “distributed” or “embodied”, it seems that this list is aiming for a complete departure from the good old information-processing approaches to cognition. I would argue that many of these works are not as radical as they sound. In fact, many useful findings that may be classified as “traditional cognitive science” can be synthesized with these approaches. In future posts I’ll focus on some of these books that I’ve read and discuss my interpretations of them.

Coming up next: different kinds of internal representations


Future blogging direction

During this year’s VisWeek (truly great conferences, I must say), some friends expressed interests in my work on theoretical / cognitive approaches to InfoVis. They complained however that my papers were no easy read – which was a bit surprising to me. I’ve tried very hard to make the arguments understandable, and you would believe me if you saw earlier drafts of my papers. Ji Soo suggested that I should start blogging about the relevant topics and materials at introductory levels, and I think it’s a great idea. So in the future I’ll periodically post something about papers/books that have influenced my thinking, important theoretical concepts and questions, and my own take on cognitive issues in InfoVis. I’m by no means an authority in these areas – in fact when I’m talking about some of the related theories and work I’ll probably offer inaccurate interpretations. But I hope that this blog will be the starting point for lively and continuous discussions on cognition, visualization, analytics and interaction. Stay tuned!

Telling stories using visualizations vs. Analyzing data using visualizations: what’s the difference?

1. Telling stories using visualization – visual primacy

It is almost impossible to talk about InfoVis without mentioning Tufte. His four books beautifully illustrate how to explain and present complex data using visual means. These books are certainly for designers, more specifically, info graphics designers. The approach can be summarized as: tell good stories with superb visual design. In most of the cases, the designers are very familiar with the data to be visualized, and they have a set of messages to convey to the graphics readers. To “transmit” the messages effectively, special attention is paid to the choices of visual metaphor, color, layout and even border thickness.  Interactive features are great add-ons that make static graphics more appealing, and they allow readers to explore additional information that could not be included in an overview. In all cases, it seems fair to use an information processing metaphor to describe the design process:  there is an intended information flow from the designers to the readers, with visualization being the medium or conduit. The success of design is measured by how effectively the intended message is received, and the readers are primarily cast as passive information processors, their minds often assumed as tabula rasa.

I would argue that much of the InfoVis research is modeled after this approach. This is especially true if we consider that many representation or interaction techniques are accompanied by illustrations showing “novel” insights revealed by clever design: “look! here’s a piece of information that’s more evidently presented!”. Whether this is something users already know, or want to know, is often overlooked.

2. Analyzing data using visualization – interaction primacy

Analyzing data using visualization (or “visual analytics”, in current buzzwords) is a very different activity, and I make a bold (well, sort of) claim that interaction, not visualization, should be the focal point of design. This doesn’t mean visual properties are not important, it just means that visual design is not sufficient and should be subsumed under a larger scope of interaction design. Here in the context of visual analytics, designers are not primarily info graphics designers, but interaction designers. They are not delivering visual displays but visual analytic tools. Often the designers have no access to end users’ data; even when they do, they are usually not sure what useful insights may exist in the data. There is clearly a less definitive message to convey — unless the designers are building a very specialized domain-specific tool. Insights, or messages must be “stumbled upon” and discovered by the users. There are hence two challenges that set interaction design apart from graphics design. First, users cannot be cast as passive information receivers, but active information seekers. Good visual designs do afford easier picking up of information, but a more important issue to be considered here is semantic distance: “how well does the interface support me to ask questions about the data?” Secondly and consequently, users’ ability of visual reasoning is a more important factor than visual properties in determining the outcome of the analysis. Traditional InfoVis research tends to overlook these issues, and I think they are an important part of visual analytics’ agenda.

Powered by Wordpress
Copyright © 2006 - 2015 Zhicheng Liu