Abstract
One of the main challenges with embodying a conversational agent is annotating how and when motions can be
played and composedtogether in real-time, without any visual artifact. The inherentproblem is to do so
— for a large amount of motions — without introducing mistakes in the annotation. To our
knowledge, there is noautomatic method that can process animations and automaticallylabel actions and
compatibility between them. In practice, a statemachine, where clips are the actions, is created manually by
settingconnections between the states with the timing parameters forthese connections. Authoring this state
machine for a large amountof motions leads to a visual overflow, and increases the amount ofpossible
mistakes. In consequence, conversational agent embodiments are left with little variations and quickly
become repetitive.In this paper, we address this problem with a compact taxonomyof chit chat behaviors, that
we can utilize to simplify and partially automate the graph authoring process. We measured the time
required to label actions of an embodiment using our simple interface,compared to the standard state
machine interface in Unreal Engine,and found that our approach is 7 times faster. We believe that
ourlabeling approach could be a path to automated labeling: once a sub-set of motions are labeled (using our
interface), we could learna prediction that could attribute a label to new clips — allowing toreally
scale up virtual agent embodiments.