Telepresence: Advancing The Future
An Ontario Telepresence Project White Paper
Table of Contents
The Threshold of
Myth of the "Super Appliance".
Telepresence: Using Human
Positioning for the Future:
The Threshold of
If current commercial products are any indicator of our
ability to design and deploy usable products and services,
then technologists as a whole have little credibility as
designers of future applications. Consider the three graphs
in figure 2.1 below. In the first graph "A", is a
stylized depiction of the growth of technology (exponential
growth over time). In "B" we show the growth of the
promised functionality (exponential). It is noteworthy to
mark a few points on graph "B". Today we are living
in a world of the telephone, photocopier and VCR. Few of us
know how to fully exploit the functionality incorporated into
these seemingly "simple" tools. In fact, these
devices are typical of technology-centred designs;
needlessly complex, arcane and with user-interfaces that are
independent and incompatible with one another.
Let us now examine graph "C". Human beings are
not developing more neurons, more hours in the day or more
capacity to learn. In fact, we have a limited ability to
absorb new things. In adult life, new skills often come at
the expense of older skills. For instance, to play a musical
instrument well, one has to use the time and energy allocated
to practicing other skills, such as horseback riding.
We call this fundamental limit the "Threshold of
Frustration." Technological wizardry and functionality
hidden above the threshold of frustration is inaccessible
(and thus invisible) to users. With the VCR, telephone, and
photocopier clearly exceeding the threshold, what credibility
do designers have to deliver promised functionality like
multi-media, virtual reality of computer supported
collaborative work (CSCW)?
Figure.1: Threshold of Frustration
The only way to apply the technology of graph
"A" and deliver the functionality of graph
"B", is to design for the innate cognitive, motor
and social skills of ordinary users. In the emerging
knowledge-based economy, only those products and services
which have the user as the focus of the design and
implementation will be successful and sustainable in the long
Myth of the "Super Appliance".
Many vendors are seeking to capitalize on the convergence
of information, telecommunications and consumer electronics.
If current trends are any indication, the main approach taken
by these vendors is one of concentrating functionality into
what could be called "super appliances". By this,
we mean televisions which are also entertainment systems,
shopping centres and on-line video stores or computers that
are also answering machines, telephones, video editing
suites, electronic books, all rolled into one.
The applications that are being designed to run on these
"super appliances" are designed and optimized with
limited interaction between them. The model used in their
conception is technology-based as is shown in Figure 2
Figure 2: Traditional Collaboration Model
This model categorizes collaborative technologies
according to two dimensions: time and place, where each
dimension has two values (same/different). While each tool in
the matrix runs on the same workstation, no tool is aware of
the state of the user, the natural transitions necessary from
tool to tool nor the state of any other tool. Thus, the user
is forced to adapt his/her work flow to the tools available, rather
than the tools adapting to the changing parameters of work.
While this model is in wide use in the CSCW community and
is useful in establishing a taxonomy of tools and system, we
observe that it is inherently technology based and
thus inappropriate for use in the design and implementation
of systems to successfully support work groups.
Telepresence: Using Human Centered Design
Our experience in the Ontario Telepresence Project was to
take precisely the opposite approach. Rather than starting
with the "workstation" as the focus of our
implementation, we chose to employ sociologists and
psychologists to first study the users in their environment,
and analyze the nature of the work and the social
interactions that were central to the culture of the
organization. Only when the social ecology of the
workplace was understood, did we begin our design of
applications to support selected activities.
When we designed applications for various internal and
external field study sites, we chose to use a large number of
simple and specialized devices distributed in space and
located where users needed them. Although these devices are
separate, they work together in concert because they were
networked over the same computer controlled audio/visual
The net result was a system architecture and deployment
methodology that was strong (due to the specificity of
individual devices) yet general (due to the overall
functionality offered by the networked family, as a whole).
This is an approach which we have pioneered, and its adoption
or incorporation into design we feel will have a major impact
on the usability and success of future systems.
A key motivation driving our work is a desire to make the
transitions between the various tasks and activities of the
workplace "seamless." By this we mean an
effortless, natural transitions between people, tasks, times
and places. Seamlessness operates in a number of dimensions:
- between foreground and background tasks
- in moving attention between person and task
in collaborative work;
- between local and distant environments;
- between computer-mediated human-human
communication and human-computer communication;
- bridging between the artifacts (such as documents) in
our computer-based information space and those
in our physical space.
We can illustrate how these notions fit together through
the introduction of an alternate model for workgroup
interaction described below:
A New Workgroup Model
We introduce a complimentary model that helps in
clarifying the shared work at a distance from a human
Figure 3: New Workgroup Model
Here, the rows are human-human communication and human-computer
communication. The columns are foreground activities
and background activities. Foreground activities are
tasks which are intentional i.e., require human to activate
for usage. Speaking on the telephone, or typing into a
computer are two examples.
"Background" tasks take place in the periphery
i.e., "behind" those in the foreground. Examples
include being aware of someone in the next office typing, or
the light in your kitchen going on automatically when you
enter it, as opposed to manually flicking the switch (a
foreground intentional act).
From this simple model, we can derive some valuable
observations. First, while tools exist to serve activities in
the left column, this is much less true in the case of the
right column and nearly all work in technology mediated
human-human and human-computer interaction falls in the left
column. It is our fundamental belief that the real
"sweet spots" in distributed work effectiveness lie
on the right hand side of the model.
One could argue that in supporting human-human
interaction, telephones and video conferencing do a fairly
good job. One can hold fairly rich conversations, see each
other, judge moods, etc. So why is there still such a sense
of distance between people, despite such technology? Our
belief is that this is due to the fact that such technologies
do not share some of the key affordances that occur naturally
when people work in close physical proximity. Regardless of
the fidelity of the video phone, I still have no sense of who
is in when I call. I can't "bump into" people in
the hall, know who is available and who is busy, or take
advantage of synergistic opportunities when just the right
combination of people happen to be at the water cooler at a
particular time. Yet, in shared physical space, all of these
are commonly available almost effortlessly in the background,
due to our "peripheral awareness."
Reflecting the Social Ecology of the Workplace
Based on this observation, we propose to develop and
evaluate a means of sharing the periphery, the background
social ecology, through the use of appropriate technological
tools and prostheses. Moreover, we will explore means to
support seamless integration of these background tools with
existing (and new) foreground applications. Through this
approach, we believe we will achieve a significant
improvement in the sense of co-presence, or
"telepresence" operating over wide geographic
One example of such a technology is referred to in the
upper right corner of figure 4, the Portholes system
originally developed by Xerox PARC / Rank Xerox EuroPARC
(this system has since been commercialized by Telepresence Systems Inc.
as ProRata). Portholes is a system which takes video
"snapshots" of members of a community every 5
minutes, and circulates them to the computer screens of the
members of that same community, as shown in the figure below.
Hence, all members have an increase awareness of who is in,
what they are doing and if they might be available.
Participants give permission to be viewed through selecting
their door state if they close there office door no image is
allowed. They also provide a means of combating the all too
human tendency towards "out of sight, out of mind."
All members of the community have a visual presence,
regardless of actual geographical location.
Figure 4: Portholes
Portholes is an excellent example of a background
"awareness server," of which there are many others
currently under experimentation.
Likewise, along the Human-Computer interaction dimension,
there are also background technologies. The example cited in
the bottom right quadrant of the figure is "smart
house" technology. These are technologies such as those
which turn down the heat on weekends, automatically water
your plants, close blinds, turn on lights, etc., under
computer control. Among others, Telepresence Systems, a
spin-off company from the Ontario Telepresence Project, will
seek to uncover new tools to facilitate background
communication among users providing them with all-important
contextual information for their business transactions.
Current Videoconferencing Systems are only a Start
Current market offerings of videoconferencing systems
provide excellent value in that they can provide many of the
benefits of face-to-face meetings. As the costs of these
systems plummet and video communication will become more
common in business and government offices. However, it is
important to note that by itself, videoconferencing only
provides users with a tool to support human-to-human,
foreground communications. The other quadrants of the model
described above are not addressed by this technology.
Moreover, even in simple conference-room to
conference-room communications, there is room for refinement
in design and implementation. For example, consider for a
moment how we use a conference room. A conference room has
walls, furniture and audio-visual aids that enable people
using the room to interact in a special way. The presenter
stands at the front of the room while he/she presents. The
other participants sit around the table or in chairs along
the walls. The presenter and the audience members establish
eye-contact with one another, interrupt (or stay silent) as
is socially appropriate, make quiet aside remarks to one
another, etc. The presenter moves around the front of the
room writing on a whiteboard, transparency or flip chart,
pacing, gesturing in the air, etc. The presenter will often
sit down at the conference table while an audience member
stands up and becomes the presenter.
Other than in custom-built rooms, current
videoconferencing systems have not yet capitalized on this
rich milieu for interaction. For example, if a formal
presentation is being given, there is no distinction between
"front" and "back" of the room (see
below) no way to use white boards or other large shared
drawing surfaces, no ability to establish eye-contact or have
an aside conversation with an individual remote person.
Working with our industrial partners, we sought to explore
how to use these common artifacts of the conference room in
meetings with remote participants. We have built some
experimental conference rooms which seek to adapt technology
to the social ecology of conference rooms. Among the many
innovations we have introduced is the notion of
"back-to-front" video conferencing depicted seen in
Figure 5. Remote attendees, site at the "back" of
the room, each on a different monitor. Eye contact is
maintained between the presenter and the remote attendees
through one or more cameras associated with the monitors. If
a remote participant should wish to make a presentation to
the group, he/she can be switched to the large front monitor
for the duration of the presentation, the same social action
as if he/she were physically present.
Figure 5: Back-to-Front Videoconferencing
Seamless Movement Between the Human, Computer, the Foreground
Our belief is that the real power of this model comes not
from merely populating the individual quadrants, but by
providing the means to seamlessly make transitions from
quadrant to quadrant, as illustrated by the arrows in the
version of the model below:
Figure 6: Seamless Movement Between Cells of
- Let us illustrate this point with an example that
will relate to a problem familiar to many: trying to
arrange a conference call among a number of
colleagues, all of whom are busy, hard to reach, and
at different sites.
- 1) With using our tools, the user would
glance at their portholes window to determine
if the people appeared to be available. If
so, they would use portholes to contact them
and the problem is solved - we would have
made a transition from the top right to the
top left quadrant (via the bottom left, when
interacting with portholes).
- 2) However, what if the more typical case
were true: nobody appeared to be available.
In this case we instruct an "agent"
on our machine to let us know when the
parties are available. This is done by simply
selecting the appropriate people by pointing
at their portholes images, and selecting an
operator, such as "set up video
conference when available."
- 3) Moving to the bottom right quadrant, in
the background - while you resume other work
- the agent "looks" at the incoming
portholes images, scanning for any changes.
Through simple image processing it can detect
comings and goings in the remote offices.
When all parties appear to be available, the
agent initiates a foreground dialogue with
the user, suggesting that now might be an
opportune time for the meeting.
- 4) If so, the user initiates the meeting, and
the conversation begins. In a seamless
manner, one has moved counter-clockwise from
the top right to the top left quadrant. High
value and functionality is obtained with
minimal complexity for the user. A prosthesis
which makes up for many of the problems of
distance is provided.
- Our belief is that this is just one example of many,
and that the architecture which we are pursuing
affords exploring such synergies in an effective and
Positioning for the Future:
This model reveals some important technological
characteristics, as well aspects of usage. This is
illustrated in the following refinement of the basic figure.
Figure 7: Refinement of the Model which
Telecommunications Network Implications
What we have added here are labels that characterize the
bandwidth of the two columns. We observe that activities in
the left column are high bandwidth, but bursty, whereas those
in the right are relatively low bandwidth, but persistent.
For example, a video phone call is high bandwidth, but we may
only make 5 calls a day. On the other hand, distributing
Portholes or ProRata images is persistent, running constantly
in the background. However, the bandwidth required to
distribute images is relatively low. Viewed in the context of
seamlessly moving from quadrant to quadrant, what we have is
a means of capturing the notion of "bandwidth on
demand." Furthermore, the model which emerges from this
approach is in may ways richer that those commonly used, such
as video on demand.
From the telecommunications perspective, what we have is a
usage based model which argues strongly that a
traditional telephony model (i.e., foreground calls, video
conferencing, etc.) is not adequate to support telepresence
(including telework, distance learning, etc.). What is
equally important is the point that this observation has
emerged from a methodology based on placing the emphasis on use not
technology - a methodology which will carry on through the