Visualization Analysis & Design - Excerpts

Published

May 20, 2023

In this blog post, I have written excerpts from the book Visualization Analysis & Design by Tamara Munzner.

Preface

Book recommendations:

Envisioning Information (Tufte)
Information Visualization: Perception for Design (Ware)
Visual Thinking for Design (Ware)
Data Visualization (Ward)
Data Visualization (Telea)

Structure: What’s in This Book

Chapter 1: high-level introduction to an analysis framework of breaking down vis design to what-why-how questions that have data-task-idiom answers
Chapter 2: addresses the what question with answers about data abstraction
Chapter 3: addresses the why question with task abstractions
Chapter 4: extends the analysis framework to two additional levels: the domain situation level on top and the algorithm level on the bottom
Chapter 5: the principles of marks and channels for encoding information
Chapter 6: eight rules of thumb for design
Chapter 7: how to visually encode data by arranging space for tables
Chapter 8: for spatial data
Chapter 9: for networks
Chapter 10: choices for mapping color and other channels in visual encoding
Chapter 11: ways to manipulate and change a view
Chapter 12: ways to facet data between multiple views
Chapter 13: how to reduce the amount of data shown in each view
Chapter 14: embedding information about a focus set within the context of overview data
Chapter 15: six case studies

Accompanying web page

Chapter 1: What’s Vis, and Why Do It?

Computer based visualization systems provide visual representation of datasets designed to help people carry out tasks more effectively.

Vis usage can be analyzed in terms of why the user needs it, what data is shown, and how the idiom is designed.

1.2 Why Have a Human in the Loop?

Vis allows people to analyze data when they don’t know exactly what questions they need to ask in advance.

If a fully automatic solution has been deemed to be acceptable, then there is no need for human judgment, and thus no need for you to design a vis tool.

The outcome of designing vis tools targeted at specific real-world domain problems is often a much crisper understanding of the user’s task, in addition to the tool itself.

1.3 Why Have a Computer in the Loop?

By enlisting computation, you can build tools that allow people to explore or present large datasets that would be completely unfeasible to draw by hand, thus opening up the possibility of seeing how datasets change over time.

As a designer, you can think about what aspects of hand-drawn diagrams are important in order to automatically create drawings that retain the hand-drawn spirit.

1.4 Why Use an External Representation?

Vis allows people to offload internal cognition and memory usage to the perceptual system, using carefully designed images as a form of external representations, sometimes called external memory.

1.5 Why Depend on Vision?

The visual system provides a very high-bandwidth channel to our brains. A significant amount of visual information processing occurs in parallel at the preconscious level.

Sound is poorly suited for providing overviews of large information spaces compared with vision. We experience the perceptual channel of sound as a sequential stream, rather than as a simultaneous experience where what we hear over a long period of time is automatically merged together.

1.6 Why Show the Data in Detail?

Statistical characterization of datasets is a very powerful approach but it has the intrinsic limitation of losing information through summarization.

Anscombe’s Quartet illustrates how datasets that have identical descriptive statistics can have very different structures that are immediately obvious when the dataset is shown graphically.

1.7 Why Use Interactivity?

When datasets are large enough, the limitations of both people and display preclude just showing everything at once.

1.8 Why is the Vis Idiom Design Space Huge?

idiom: a distinct approach to creating and manipulating visual representations.

1.9 Why Focus on Tasks?

A tool that serves well for one task can be poorly suited for another, for exactly the same dataset.

Reframing the users’ task from domain-specific form into abstract form allows you to consider the similarities and differences between what people need across many real-world usage contexts.

1.10 Why Focus on Effectiveness?

The goals of the designer are not met if the result is beautiful but not effective.

Any depiction of data is an abstraction where choices are made about which aspects to emphasize.

1.11 Why Are Most Designs Ineffective?

The vast majority of the possibilities in the design space will be ineffective for any specific usage context.

In addressing design problems, it’s not a very useful goal to optimize or find the very best choice. A more appropriate goal when you design is to satisfy or find one of the many possible good solutions rather than one of the even larger number of bad ones.

Progressively smaller search spaces:

Space of possible solutions
Space of solutions known to the designer
Space of solutions you actively consider
Space of solutions you investigate in detail
Selected solution

The problem of a small consideration space is the higher probability of only considering OK or poor solutions and missing a good one.

One way to ensure that more than one possibility is considered is to explicitly generate multiple ideas in parallel.

1.12 Why Is Validation Difficult?

How do you know it works? How do you argue that one design is better or worse than another for the intended users? What does better mean? Do users get something done faster? Do they have more fun doing it? Can they work more effectively? What does effectively mean? How do you measure insight or engagement? What is the design better than? Is it better than another vis system? Is it better than doing the same things manually, without visual support? Is it better than doing the same things completely automatically? And what sort of thing does it do better? How do you decide what sort of task the users should do when testing the system? And who is the user? An expert who has done this task for decades, or a novice who needs the task to be explained before they begin? Are they familiar with how the system works from using it for a long time, or are they seeing it for the first time? Are the users limited by the speed of their own thought process, or their ability to move the mouse, or simply the speed of the computer in drawing each picture?

How do you decide what sort of benchmark data you should use when testing the system? Can you characterize what classes of data the system is suitable for? How might you measure the quality of an image generated by a vis tool? How well do any of the automatically computed quantitative metrics of quality match up with human judgments? Does the complexity of the algorithm depend on the number of data items to show or the number of pixels to draw? Is there a trade-off between computer speed and computer memory usage?

1.13 Why Are There Resource Limitations?

Three different kinds of limitations:

Computational capacity
Human perceptual and cognitive capacity
Display capacity

scalability: design systems to handle large amounts of data gracefully.

Designing systems that gracefully handle larger datasets that do not fit into core memory requires significantly more complex algorithms.

Human memory for things that are not directly visible is notoriously limited.

change blindness: when even very large changes are not noticed if we are attending to something else in our view.

information density: a measure of the amount of information encoded versus the amount of unused space.

There is a trade-off between the benefits of showing as much as possible at once (to minimize the need for navigation and exploration) and the costs of showing too much at once (where the user is overwhelmed by visual clutter).

1.14 Why Analyze?

Analyzing existing systems is a good stepping stone to designing new ones.

High-level framework for analyzing vis use according to three questions:

what data the user sees (data)
why the user intends to use a vis tool (task)
how the visual encoding and interaction idioms are constructed in terms of design choices (idiom)

one of these analysis trios is called an instance.

Complex vis tool usage often requires analysis in terms of a sequence of instances that are chained together. (sort > finding outliers).

Chapter 2: What: Data Abstraction

2.1 The Big Picture

What?
- Datasets
  - Data Types
    - Items
    - Attributes
    - Links
    - Positions
    - Grids
  - Data and Dataset Types
    - Tables
    - Networks & Trees
    - Fields
    - Geometry
    - Clusters, Sets, Lists
  - Dataset Availability
    - Static
    - Dynamic
- Attributes
  - Attribute Types
    - Categorical
    - Ordered
      - Ordinal
      - Quantitative
  - Ordering Direction
    - Sequential
    - Diverging
    - Cyclic

2.2 Why Do Data Semantics and Types Matter?

Many aspects of vis design are driven by the kind of data that you have at your disposal.

semantics: the real-world meaning of the data.

type: the structural or mathematical interpretation of the data.

2.3 Data Types

Five basic data types discussed in this book:

Items
- Individual entity that is discrete (such as a row in a simple table or a node in a network)
Attributes
- Some specific property that can be measured, observed, or logged
Links
- A relationship between items, typically within a network
Positions
- spatial data
Grids

2.4 Dataset Types

dataset: any collection of information that is the target of analysis

Dataset types:

tables
- items
- attributes
networks
- items (nodes)
- links
- attributes
fields
- grids
- positions
- attributes
geometry
- items
- positions
clusters, sets and lists
- items

2.4.1 Tables

flat table: each row represents and item of data, each column is an attribute of the dataset

cell: fully specific by the combination of a row and a column (item and attribute) and contains a value for that pair.

multidimensional table: more complex structure for indexing into a cell, with multiple keys

2.4.2 Networks and Trees

networks: well suited for specifying that there is some kind of relationship (link) between two or more items (nodes)

A synonym for networks is graphs.

A synonym for node is vertex.

A synonym for link is edge.

2.4.2.1 Trees

Trees: networks with hierarchical structure. Each child node has only one parent node pointing to it.

2.4.3 Fields

Contains attribute values associated with cells. Each cell in a field contains measurements or calculations from a continuous domain.

sampling: how frequently to take the measurements (of continuous data).

interpolation: how to show values in between the sampled points in a way that does not mislead. Interpolating appropriately between the measurements allows you to reconstruct a new view of the data from an arbitrary viewpoint that’s faithful to what you measured.

discrete: data where a finite number of individual items exist where interpolation between them is not a meaningful concept.

Technically all data stored within a computer is discrete rather than continuous; however, the interesting question is whether the underlying semantics of the bits that are stored represents samples of a continuous phenomenon or intrinsically discrete data.

2.4.3.1 Spatial Fields

Cell structure of the field is based on sapling at spatial positions.

A synonym for nonspatial data is abstract data.

scientific visualization (scivis): concerned with situations where spatial position is given with the dataset. A central concern in scivis is handling continuous data appropriately within the mathematical framework of signal processing.

information visualization (infovis): concerned with situations where the use of space in a visual encoding is chosen by the designer. A central concern of infovis is determining whether the chosen idiom is suitable for the combination of data and task, leading to the use of methods from human-computer interaction and design.

2.4.3.2 Grid Types

When a field contains data created by sampling at completely regular intervals, the calls form a uniform grid.

grid geometry: location in space.

grid topology: how each cell connects with its neighboring cells.

rectilinear grid: supports nonuniform sampling, allowing efficient storage of information that has high complexity in some areas and low complexity in others, at the cost of storing some information about the geometric location of each row.

structured grid: allows curvilinear shapes, where the geometric location of each cell needs to be specified.

unstructured grid: provides complete flexibility, but the topological information about how cells connect to each other must be stored explicitly in addition to their spatial positions.

2.4.4 Geometry

Specifies information about the shape of items with explicit spatial positions. Geometry datasets do not necessarily have attributes.

Geometric data is sometimes shown alone, particularly when shape understanding is the primary task. In other cases, it is the backdrop against which additional information is overlaid.

2.4.5 Other Combinations

set: unordered group of items.

list: a group of items with a specified ordering.

cluster: a grouping based on attribute similarity.

path: an ordered set of segments formed by links connecting nodes.

compound network: a network with an associated tree (all the nodes in the network are the leaves of the tree, and interior nodes in the tree provide a hierarchical structure for the nodes that is different from network links between them).

data abstraction: describing the what part of an analysis instance that pertains to data.

2.4.6 Dataset Availability

static file (offline): the entire dataset is available all at once.

dynamic streams (online): the dataset information trickles in over the course of the vis session.

2.5 Attribute Types

The major distinction is between categorical versus ordered.

Ordered type contains further differentiation between ordinal versus quantitative.

Ordered data might range sequentially from a minimum to a maximum value, or it might diverge in both directions from a zero point in the middle of a range, or the values may wrap around in a cycle.

Attributes may have a hierarchical structure.

2.5.1 Categorical

Does not have implicit ordering, but if often has hierarchical structure.

A synonym for categorical is nominal.

Any arbitrary external ordering can be imposed upon categorical data but these orderings are not implicit in the attribute itself.

2.5.2 Ordered: Ordinal and Quantitative

ordered data: does have an implicit ordering.

ordinal data: we cannot do full-fledged arithmetic with, but there is a well defined ordering (shirt sizes, rankings).

quantitative data: a subset of ordered data. A measurement of magnitude that supports arithmetic comparison (height, weight, temperature, stock price, etc). Both integers and real numbers are quantitative data.

2.5.2.1 Sequential versus Diverging

sequential: a homogeneous range from a minimum to a maximum value.

diverging: two sequences pointing in opposite directions that meet at a common zero point.

2.5.2.2 Cyclic

cyclic: where the values wrap around back to a starting point rather than continuing to increase indefinitely.

2.5.3 Hierarchical Attributes

The attribute of time can be aggregated hierarchically from days up to weeks, months and years.

The geographic attribute of a postal code can be aggregated up to the level of cities or states or entire countries.

2.6 Semantics

Knowing the type of an attribute does not tell us about its semantics.

2.6.1 Key versus Value Semantics

key attribute: acts as an index that is used to look up value attributes.

A synonym for key attribute is independent attribute or dimension.

A synonym for value attribute is dependent attribute or measure.

2.6.1.1 Flat Tables

flat table: has only one key, where each item corresponds to a row in the table and any number of value attributes. Key may be categorical or ordinal attributes but quantitative attributes are typically unsuitable as keys because there is nothing to prevent them from having the same values for multiple items.

2.6.1.2 Multidimensional Tables

where multiple keys are required to look up an item. The combination of all keys must be unique for each item, even though an individual key attribute may contain duplicates.

2.6.1.3 Fields

In spatial fields, spatial position acts as a quantitative key.

multivariate structure of fields depends on the number of value attributes.

multidimensional structure of fields depends on the number of keys.

a scalar field has one attribute per cell.

a vector field has two or more attributes per cell.

a tensor field has many attributes per cell.

2.6.1.4 Scalar Fields

are univariate, with a single attribute at each point in space.

2.6.1.5 Vector Fields

are multivariate with a list of multiple attributes at each point. The dimensionality of the field determines the number of components in the direction vector.

2.6.1.6 Tensor Fields

have an array of attributes at each point, representing a more complex multivariate mathematical structure than the list of numbers in a vector. The full information at each point in a tensor field cannot be represented by just an arrow and would require a more complex shape such as an ellipsoid.

2.6.1.7 Field Semantics

Categorization of spatial fields requires knowledge of the attribute semantics and cannot be determined from type information alone.

2.6.2 Temporal Semantics

temporal attribute: any kind of information that relates to time.

Temporal analysis tasks often involve finding or verifying periodicity either at a predetermined scale or at some scale not known in advance.

A temporal key attribute is usually considered to have a quantitative type, although it’s possible to consider it as ordinal data if the duration between events is not interesting.

2.6.2.1 Time-Varying Data

when time is one of the key attributes, as opposed to when the temporal attribute is a value rather than a key.

The question of whether time has key or value semantics requires external knowledge about the nature of the dataset and cannot be made purely from type information.

time-series dataset: an ordered sequence of time-value pairs. A special case of tables where time is the key.

dynamic can mean a dataset has time-varying semantics or a dataset has stream type.

Chapter 3: Why: Task Abstraction

Why?
- Actions
  - Analyze
    - Consume
      - Discover
      - Present
      - Enjoy
    - Produce
      - Annotate
      - Record
      - Derive
  - Search
    - Lookup (target known, location known)
    - Browse (target unknown, location known)
    - Locate (target known, location unknown)
    - Explore (target and location unknown)
  - Query
    - Identify
    - Compare
    - Summarize
- Targets
  - All Data
    - Trends
    - Outliers
    - Features
  - Attributes
    - One
      - Distribution
    - Many
      - Dependency
      - Correlation
      - Similarity
  - Network Data
    - Topology
      - Paths
  - Spatial Data
    - Shape

3.1 The Big Picture

Discovery may involve generating or verifying a hypothesis
Search can be classified according to whether the identity and location of targets are known or not
- both are known with lookup
- the target is known but its location is not for locate
- the location is known but the target is not for browse
- neither the target nor the location are known for explore
Queries can have three scopes:
- identify one target
- compare some targets
- summarize all targets
Targets for all kinds of data are finding trends and outliers
For one attribute, the target can be:
- one value,
- the extremes of minimum and maximum values or
- the distribution of all values across the entire attribute
For multiple attributes the target can be:
- dependencies
- correlations or
- similarities between them

3.2 Why Analyze Tasks Abstractly?

Transforming task descriptions from domain-specific language into abstract form allows you to reason about similarities and differences between them.

If you don’t to this kind of translation then everything just appears to be different. The apparent difference is misleading: there are lots of similarities in what people want to do once you strip away the surface language differences.

The analysis framework has verbs describing actions and nouns describing targets.

It is often useful to consider only one of the user’s goals at a time, in order to more easily consider the question of how a particular idiom supports that goal. To describe complex activities, you can specify a chained sequence of tasks, where the output of one becomes the input to the next.

Task abstraction can and should guide the data abstraction.

3.3 Who: Designer or User

On the specific side, tools are narrow: the designer has built many choices into the design of the tool itself in a way that the user cannot override.

On the general side, tools are flexible and users have many choices to make.

Specialized vis tools are designed for specific contexts with a narrow range of data configurations, especially those created through a problem-driven process.

3.4 Actions

Three levels of actions that define user goals:

how the vis is being used to analyze (consume or produce data)
what kind of search is involved (whether target and location are known)
what kind query (identify one target, compare targets, or summarize all targets)

3.4.1 Analyze

Two possible goals of people who want to analyze data: consume or actively produce new information.

Consume information that has already been generated as data stored in a format amenable to computation
- Discover something new
- Present something that the user already understands
- Enjoy a vis to indulge their casual interests in a topic

3.4.1.1 Discover

Using a vis to find new knowledge that was not previously known, by the serendipitous observation of unexpected phenomena or motivated by existing theories, models, hypotheses or hunches.

generate a new hypothesis: finding completely new things

verify or disconfirm an existing hypothesis.

The discover goal is often discussed as the classic motivation for sophisticated interactive idioms, because the vis designer doesn’t know in advance what the user will need to see.

(discover = explore, present = explain).

Why the vis is being used doesn’t dictate how the vis idiom is designed to achieve those goals.

3.4.1.2 Present

The use of vis for the succinct communication of information, for telling a story with data, or guiding an audience through a series of cognitive operations.

The crucial point about the present goal is that vis is being used by somebody to communicate something specific and already understood to an audience. The knowledge communicated is already known to the presenter in advance. The output of a discover session becomes the input to a present session.

The decision about why is separable from how the idiom is designed: presentation can be supported through a wide variety of idiom design choices.

3.4.1.3 Enjoy

Casual encounters with vis.

A vis tool may have been intended by the designer for the goal of discovery with a particular audience, but it might be used for pure enjoyment by a different group of people.

3.4.2 Produce

The intent of the user is to generate new material.

There are three kinds of produce goals:

annotate
record
derive

3.4.2.1 Annotate

the addition of graphical or textual annotations associated with one or more preexisting visualization elements, typically as a manual action by the user. Annotation for data items could be thought of as a new attribute for them.

3.4.2.2 Record

Saves or captures visualization elements as persistent artifacts (screenshots, lists of bookmarked elements or locations, parameter settings, interaction logs, or annotations). An annotation made by a user can subsequently be recorded.

3.4.2.3 Derive

Produce new data elements based on existing data elements. There is a strong relationship between the form of the data (the attribute and dataset types) and what kinds of vis idioms are effective at displaying it.

Don’t just draw what you’re given; decide what the right thing to show is, create it with a series of transformations from the original dataset, and draw that.

A synonym for derive is transform.

derived attributes extend the dataset beyond the original set of attributes that it contains.

3.4.3 Search

The classification of search into four alternatives is broken down according to whether the identity and location of the search target is already known. The verb find is often used as a synonym in descriptions of search tasks, implying a successful outcome.

3.4.3.1 Lookup

Users already know both what they’re looking for and where it is.

3.4.3.2 Locate

To find a known target at an unknown location.

3.4.3.3 Browse

When users don’t know exactly what they’re looking for but they do have a location in mind of where to look for it.

3.4.3.4 Explore

When users don’t know what they’re looking for and are not even sure of the location.

3.4.4 Query

Once a target or set of targets for a search has been found, a low-level user goal is to query these targets at one of three scopes: identify (single target), compare (multiple targets) or summarize (all targets).

3.4.4.1 Identify

If a search returns known targets either by lookup or locate then identify returns their characteristics.

If a search returns targets matching particular characteristics either by browse or explore, then identify returns specific references.

3.4.4.2

Comparison tasks are typically more difficult than identify tasks and require more sophisticated idioms to support the user.

3.4.4.3 Summarize

A synonym for summarize is overview: to provide a comprehensive view of everything (verb) and a summary display of everything (noun).

3.5 Targets

Target: some aspect of the data that is of interest to the user.

Targets are nouns whereas actions are verbs.

Three high-level targets are very broadly relevant for all kinds of data:

a trend: high-level characterization of a pattern in the data. (A synonym for trend is pattern)
outliers: data that don’t fit the trend, synonyms for outliers are anomalies, novelties, deviants and surprises.
features: definition dependent on the task, any particular structures of interest

The lowest-level target for an attribute is to find an individual value. Another target is to find the extremes (min/max across a range). Another target is the distribution of all values for an attribute.

Some targets encompass the scope of multiple attributes:

dependency: the values for the first attribute directly depend on those of the second.
correlation: a tendency for the values of the second attribute to be tied to those of the first.
similarity: a quantitative measurement calculated on all values of two attributes, allowing attributes to be ranked with respect to how similar, or different, they are from each other.

Network targets:

topology: the structure of interconnections in a network.
path: of one or more links that connects two nodes.
shape: of spatial data.

3.6 How: A Preview

Encode
- Arrange
  - Express
  - Separate
  - Order
  - Align
  - Use (spatial data)
- Map
  - Color
  - Size, Angle, Curvature, …
  - Shape
  - Motion
Manipulate
- Change
- Select
- Navigate
Facet
- Juxtapose
- Partition
- Superimpose
Reduce
- Filter
- Aggregate
- Embed

The rest of this book defines, describes and discusses these choices in depth.

The Strahler number is a measure of node importance. Very central nodes have large Strahler numbers, whereas peripheral nodes have low values.

Chapter 4 Analysis: Four Levels of Validation

4.1 The Big Picture

Four nested levels of design:

Domain situation
- Task and data abstraction
  - Visual encoding and interaction idiom
    - Algorithm

4.2 Why Validate?

The vis design space is huge, and most designs are ineffective.

4.3 Four Levels of Design

Domain situation: where you consider the details of a particular application domain for vis
- Why-why abstraction level (Data-task): where you map those domain-specific problems and data into forms that are independent of the domain
  - How level (visual encoding/interaction idiom): specify the approach to visual encoding and interaction
    - Algorithm level: instantiate idioms computationally

The four levels are nested, the output from an upstream level above is input to the downstream level below. A block is the outcome of the design process at that level. Choosing the wrong block at an upstream level inevitable cascades to all downstream levels.

Vis design is usually a highly iterative refinement process, where a better understanding of the blocks at one level will feed back and forward into refining the blocks at the other levels.

4.3.1 Domain Situation

domain situation: a group of target users, their domain interest, their questions, and their data.

domain: a particular field of interest of the target users of a vis tool.

Situation blocks are identified.

The outcome of the design process is an understanding that the designer reaches about the needs of the user. The outcome of identifying a situation block is a detailed set of questions asked about or actions carried out by the target users, about a possible heterogeneous collection of data that’s also understood in detail.

Methods include: interviews, observations, or careful research about target users within a specific domain.

Working closely with a specific target audience to iteratively refine a design is called user-centered design or human-centered design.

What users say they do when reflecting on their past behavior gives you an incomplete picture compared with what they actually do if you observe them.

4.3.2 Task and Data Abstraction

Abstracting into the domain-independent vocabulary allows you to realize how domain situation blocks that are described using very different language might have similar reasons why the user needs the vis tool and what data it shows.

Task blocks are identified by the designer as being suitable for a particular domain situation block, just as the situation blocks themselves are identified at the level above.

Abstract data blocks are designed.

The data abstraction level requires you to consider whether and how the same dataset provided by a user should be transformed into another form.

Your goal is to determine which data type would support a visual representation of it that addresses the user’s problem.

Explicitly considering the choices made in abstracting from domain-specific to generic tasks and data can be very useful in the vis design process.

4.3.3 Visual Encoding and Interaction Idiom

idiom: a distinct way to create and manipulate the visual representation of the abstract data block that you chose at the previous level, guided by the abstract tasks that you also identified at that level.

the visual encoding idiom controls exactly what users see.

the interaction idiom controls how users change what they see.

Idiom blocks are designed.

The nested model emphasizes identifying task abstractions and deciding on data abstractions in the previous level exactly so that you can use them to rule out many of the options as being a bad match for the goals of the users. You should make decisions about good and bad matches based on understanding human abilities, especially in terms of visual perception and memory.

4.3.4 Algorithm

algorithm: a detailed procedure that allows a computer to automatically carry out the desired goal.

Algorithm blocks are designed.

4.4 Angles of Attack

With problem-driven work, you start at the top domain situation level and work your way down through abstraction, idiom, and algorithm decisions.

In technique-driven work, you work at one of the bottom two levels, idiom or algorithm design, where your goal is to invent new idioms that better support existing abstractions, or new algorithms that better support existing idioms.

4.5 Threats the Validity

threats to validity: different fundamental reasons why you might have made the wrong choices.

Wrong problem: You (designer) misunderstood their (target users) needs.
Wrong abstraction: You’re showing them the wrong thing.
Wrong idiom: The way you show it doesn’t work.
Wrong algorithm: Your code is too slow.

4.6 Validation Approaches

4.6.1 Domain Validation

The primary threat is that the problem is mischaracterized; the target users do not in fact have these problems (that the designer asserts would benefit from vis tool support).

field study: where the investigator observes how people act in real-world settings, rather than by bringing them into a laboratory setting. Field studies for domain situation assessment often involve gathering qualitative data through semi-structured interviews.

One downstream form of validation is adoption rates of the vis tool.

4.6.2 Abstraction Validation

The threat at this level is that the identified task abstraction blocks and designed data abstraction blocks do not solve the characterized problems of the target audience. The key aspect of validation against this threat is that the system must be tested by target users doing their own work, rather than doing an abstract task specified by the designers of the vis system.

4.6.3 Idiom Validation

The threat at this level is that the chosen idioms are not effective at communicating the desired abstraction to the person using the system. One immediate validation approach is to carefully justify the design of the idiom with respect to known perceptual and cognitive principles.

A downstream approach to validate against this threat is to carry out a lab study: a controlled experiment in a laboratory setting.

4.6.4 Algorithm Validation

The primary threat at this level is that the algorithm is suboptimal in terms of time or memory performance, either to a theoretical minimum or in comparison with previously proposed algorithms.

An immediate form of validation is to analyze the computational complexity of the algorithm, using the standard approaches from the computer science literature.

The downstream form of validation is to measures the wall-clock time and memory performance of the implemented algorithm.

4.6.5 Mismatches

A common problem in weak vis projects is a mismatch between the level at which the benefit is claimed (for example, visual encoding idiom) and the validation methodologies chosen (for example, wall-clock timings of the algorithm).

Chapter 5 Marks and Channels

5.1 The Big Picture

Marks are basic geometric elements that depict items or links, and channels control their appearance. Channels that perceptually convey magnitude information are a good match for ordered data, and those that convey identity information are a good match for categorical data.

Magnitude Channels: Ordered Attributes (Most effective to least):
- Position on common scale
- Position on unaligned scale
- Length (1D size)
- Tilt/angle
- Area (2D size)
- Depth (3D position)
- Color luminance
- Color saturation
- Curvature
- Volume (3D size)
Identity Channels: Categorical Attributes
- Spatial Region
- Color hue
- Motion
- Shape

5.2 Why Marks and Channels?

The core of the design space of visual encodings can be described as an orthogonal combination of two aspects: graphical elements called marks and visual channels to control their appearance.

5.3 Defining Marks and Channels

mark: a basic graphical element in an image

Points (0 dimensional)
Lines (1D)
Area (2D)
Volume (3D)

channel: is a way to control the appearance of marks, independent of the dimensionality of the geometric primative.

Position
- Horizontal
- Vertical
- Both
Shape
Size
- Length
- Area
- Volume
Color
Tilt (or Angle)

A single quantitative attribute can be encoded with vertical spatial position. Bar charts show this and the horizontal spatial position channel for the categorical attribute.

Scatterplots encode two quantitative attributes using point marks and both vertial and horizontal spatial position. A third categorical attribute is encoded by adding color to the scatterplot. Adding the visual channel of size encodes a fourth quantitative attribute as well.

Higher-dimensional mark types usually have built-in constraints (on size and shape) that arise from the way that they are defined. An area or line mark cannot be size or shape coded, but a point can.