Creative Sense-Making:

An Enactive Cognitive Theory, Method, and Analytical Framework for
Co-Creative AI

Nicholas Davis, PhD

Abstract

Creative sense-making (CSM) is a theoretical framework that applies the enactive frameworks of sense-making and participatory sense-making (PSM) to creativity and co-creation. CSM proposes that creativity, and co-creation in particular, are sense-making activities where meaning is built dynamically through autonomous interaction with the environment and other agents. The framework is distinguished from typical accounts of sense-making and PSM because it involves the production of a creative artifact, a piece of artificial media, whether it be ephemeral or concrete. CSM seeks to model the interaction dynamics of co-creation to learn more about the sense-making and PSM patterns present during co-creation. The framework utilizes the Enactive Model of Creativity to develop a cognitively rooted interaction coding scheme. The coding procedure is applied continuously via manual video coding of a recorded co-creative session or automatically by encoding the coding scheme into a co-creative system. The codes, when summed, produce a CSM curve that visually depicts trends and patterns in the interaction dynamics of a co-creative session. These curves can then be analyzed using statistical modeling techniques, such as stock market technical analysis. The turn-taking rhythm can also be modeled and visualized with the approach. The CSM can provide a shared methodology, theory, and analytical framework with which to conduct research in co-creative AI.

Keywords: Co-Creative AI, Participatory Cognition, Enactive Cognition, Sense-making, Creativity, Human-AI Interaction,

Introduction

Co-creative artificial intelligence (AI) is a field of study that focuses on designing and studying co-creative AI systems that contribute to a shared creative product with the users (Davis, 2013). The field of computational creativity can be segmented into three categories of systems (Davis, Karimi, Guzdial): autonomous computational creativity systems that generate creative products, creativity support tools that accelerate the user’s existing creativity, and co-creative systems that collaborate with the user on a shared creative product. In co-creative contexts, the interaction can be ephemeral, such as dance, or concrete, like design. Turn-taking can emerge between the user and AI where ideas successively build upon each other in a collaborative improvisation. The user and AI can take on different roles during the co-creation, such as definer (e.g. coming up with new ideas), refiner (e.g. elaborating on present ideas), and evaluator (e.g. reflecting on the creative product and offering critiques).

In co-creation, the creative process progresses dynamically with shared meaning gradually and interactively built through a social interplay. There is a social interaction dynamic that has a degree of autonomy that influences the co-production of the shared artifact. The sense-making of the individual can be applied to the interaction itself or to the content of the interaction (e.g. the creative output). Making sense of the interaction regulates the interaction dynamics and entails attributes such as modifying the rhythm of turn-taking, adjusting turn length, and providing feedback. Making sense of the creative output is about building meaning based on what the other collaborator contributed. In this context, goals are not predetermined, but rather emerge through a mutual and interactive process. The enactive concept of directive is more appropriate, which provides some structure and constraints for the creative activity without imposing a scripted plan. The co-creative process would then be characterized by the emergence of directives that guide the creative process along certain trajectories that can be modified in the moment by either collaborator.

The argument put forth by the creative sense-making framework (CSM) is that co-creation is a form of sense-making in which meaning is interactively constructed through the autonomous interaction of an agent with the environment and other agents within that environment, e.g. a co-creative AI. During co-creation, participants engage in participatory sense-making, where their interactions form a co-regulated interaction coupling (e.g. a structural correspondence between successive turns). The coupling is co-regulated because the autonomy of both parties is respected and each contributes to the flow of the interaction. An example in the domain of drawing would be one collaborator drawing a face and another collaborator adding ears and a hat. The turns are coupled because they have some semantic relationship with each other. Coupling can also come from similarity between turns, such as one person mimicking another. PSM investigates interaction couplings, feedback, coordination, and the rhythm of turn-taking, including periods of meaning expression and pauses.

Creative sense-making applies the idea of sense-making and participatory sense-making to creativity and co-creation. Creative sense-making is unique from typical sense-making contexts because there is a shared creative product that can be ephemeral, such as a dance, or concrete, like an artwork. The actions taken by participants are reflected in the artifact and serve as further feedback for subsequent actions. These actions can be quantified to study the sense-making processes of co-creation. CSM demonstrates how it is possible to deduce cognitively rooted modes of interaction and code interactions through time to arrive at a curve describing the interaction dynamics of co-creativity.

The CSM coding scheme and theory is based on the Enactive Model of Creativity (EMC). The EMC describes two main cognitive modes: clamped (e.g. fluidly executing actions) and unclamped (e.g. exploring the environment or interactively building a better mental model). It proposes that creativity is a fluctuation of clamped and unclamped cognition. The CSM’s analytical technique quantifies those fluctuations to define interaction patterns and trends that describe the sense-making and PSM processes present in co-creation. With this data, a model of co-creation can be derived that describes the temporally extended interaction dynamics of sense-making present in co-creation. This is a domain independent model, and it can be used to compare the interactions of co-creative systems in different creative domains.

This paper begins by providing some background on the cognitive theory of enaction and its frameworks of sense-making and participatory sense-making. Then, the Enactive Model of Creativity is described to arrive at the cognitively based interaction coding scheme. Next, the coding scheme is described, and the technique for how to apply it is explained, including a manual and automatic application procedure. The Codix video coding platform is described, which enables the manual CSM code application process. The AI Drawing Partner is described, which is a co-creative drawing environment that automatically codes the CSM’s interaction mode codes. Next, the analysis process for the CSM curves is described, including stock market technical analysis and turn-taking rhythm analysis. A discussion follows that examines how the CSM is a domain independent theoretical, methodological, and analytical framework to advance co-creative AI.

The primary contribution of this article is expanding the existing CSM framework into a full theoretical, methodological, and analytical framework to study human-AI co-creation. The secondary contribution is the novel analytical framework presented that can aid in the analysis and interpretation of results in co-creative AI research. The final contribution is the introduction and explanation of two freely available research platforms with which to conduct analysis with the CSM to the research community: 1) Codix, a video analytics and coding platform to manually apply the CSM coding procedure, and 2) AI Drawing Partner, which is a quantified co-creative drawing system that automatically applies the CSM coding procedure.

Related Work

Theories of Co-Creation

Within computational creativity research, Guzdial and Riedl introduced a framework to represent turn-based co-creative systems, emphasizing the importance of understanding the start and end conditions, the actions each partner can take, the nature of the AI agent, the role of the user, and the turn-taking dynamics (Guzdial et al., 2019). This framework provides a structured way to analyze and design co-creative interactions, highlighting the nuances of different collaborative styles within the interaction. In a turn-taking system, the framework helps quantitatively define aspects of the system, which can then be used to evaluate similar co-creative systems based on these features systematically.

Kantosalo et al. (2020) extend this idea by introducing a layered model for describing interactions with co-creative AI systems: interaction modalities, styles, and strategies \cite{kantosalo2020modalities}. Modalities refer to the channels of communication, styles describe the structure and behavior of interactions, and strategies govern the system's decision-making process. This layered approach allows for a more nuanced understanding of the complex interplay between humans and AI in co-creative processes. For example, a system might use visual and auditory modalities to communicate with the user, adopt a collaborative style that encourages back-and-forth interaction (turn-taking), and employ a strategy that prioritizes novelty and surprise in its creative output.

Wu et al. (2021) further extend this interaction modality understanding by introducing the "Human-AI Co-Creation Model," a circular process encompassing six phases: perceiving, thinking, expressing, collaborating, building, and testing \cite{wu2021ai}. This model emphasizes that AI can augment human capabilities in each phase, leading to a more enhanced and efficient creative process. In the "perceiving" phase, the AI can analyze vast datasets to provide insights that might not be readily apparent to humans, while in the "expressing" phase, AI tools can help users generate and iterate on creative outputs more rapidly. The emphasis on collaboration in Wu et al.'s model aligns with the core tenet of co-creative AI, where humans and AI actively contribute to the creative process, fostering a sense of shared agency and ownership over the final artifact, whether concrete or ephemeral.

Rezwana (2022) further enriched this understanding by focusing on the technical systems underpinning human-AI co-creation in their Co-Creative Framework for Interaction Design (COFI) \cite{rezwana2022designing}. This framework emphasizes the role of the interactive system's design, the algorithms in use, and the level of automation in shaping how collaboration manifests. A tool offering basic suggestions might encourage a turn-taking style, while a complex, manipulative environment could allow for task-divided or even simultaneous collaboration. COFI also provides a structured approach to analyzing and designing co-creative interactions by examining various aspects of the interaction between humans and AI, such as the types of contributions made by each party, the communication channels used, and the overall creative process. By considering these features, COFI offers a comprehensive framework for understanding and designing effective co-creative AI systems.

While these frameworks offer valuable insights into co-creative AI, they primarily focus on the technical and interaction design aspects. On the other hand, the Creative Sense-Making (CSM) framework delves deeper into the cognitive and social dynamics of co-creation, emphasizing the importance of shared meaning-making and the emergence of directives through interaction. CSM's unique contribution lies in its ability to quantify the fluctuations between clamped and unclamped cognition, providing a nuanced understanding of the interaction patterns and trends in co-creative processes. This focus on the underlying cognitive processes and the dynamic interplay between humans and AI sets CSM apart from other frameworks, positioning it as a valuable tool for understanding and enhancing the co-creative experience.

Enactivism

Enactivism stands in contrast to traditional cognitivist views of cognition as information processing systems that sense, plan, and act according to environmental input. In an enactive account, the sensing and acting occur in tandem through interaction with the environment. Perception is for action and guided through action. Perceptual sensing is an activity itself guided by affordances in the environment that represent the action potentials of objects. Cognition is said to be embodied and embedded in a dynamic environment that is leveraged to support cognition, making cognition extended to the environment as well. The enactive account cuts across the brain-body-world divide by viewing cognition as a sense-making activity in which interaction with the environment and the feedback from those interactions inform an intelligent and embodied perceptual process.

There are five pillars of important concepts in enactivism: autonomy of an agent to maintain its existence under precarious conditions through interaction and exchange with the environment, sense-making or imbuing the environment with meaning through interaction to continue the autonomous identity of the cognizer, emergence of meaning through interaction, embodiment of the agent such that actions are afforded and constrained based on body configuration, and experience in which an interaction flows and has a trajectory and history of interactions.

There are three strands of enactivism present in the literature: autopoetic enactivism, sensorimotor enactivism, and radical enactivism. Autopoetic enactivism argues for a continuity of life to mind and investigates the biodynamics of an organism as it interacts with its environment to continue its autonomous existence. These researchers seek to define the cognitive structures that emerge as a result of the unique biological imperatives of the cognizer. Central to this strand of enactivism is the concept of autonomy, which is the notion of an organism’s ability to survive under precarious conditions through building meaning by interacting and exchanging resources with the environment. The organism can also modify the environment to change the available actions it can take on the environment, making it adaptive. Cognitive agents, according to this account, are autonomous and adaptive.

Sensorimotor enactivism investigates the interrelationship between perception and action. These theorists view perception as an activity in which the cognizer reaches out into the world to perceive elements of that world through sensorimotor feedback loops. They see perception as guided by sensorimotor feedback loops that form through interaction with the environment, and these sensorimotor loops constrain and inform cognition. These sensorimotor feedback loops, when activated through time, form sensorimotor contingencies, which are defined as “patterns of dependence obtaining between perception and exploratory activity” (Ward et al., 2017) that structure the manner in which perceptual activity is carried out.

Radical enactivism, or radically enactive cognition (REC), argues against internal representations, and instead proposes that cognition is an embodied activity that is carried out without a mental model. “REC takes up the general enactivist project of rejecting cognitivism in favour of analysing minds in terms of dynamic patterns of adaptive environmental interactions” (Ward et al., 2017).

Applying enaction as a lens for co-creation involves investigating the dynamics of interaction between a user and an AI, such as the social interplay and shared meaning construction present during co-creation. In their work, The Five Pillars of Enaction as a Theoretical Framework for Co-Creative AI, Davis et al., (2024) define enactive co-creative AI as follows:

“At least one human and one agent collaborating on a shared creative product where the autonomy of the user and agent is maintained and meaning is built through interaction, coordination, communication, and feedback. The agent and user engage in sense-making (regulating interaction with the environment) and participatory sense-making (regulating a social sense-making process) to understand each other’s creative intentions and enact or bring forth meaning in the environment. Both the user and agent are embodied, with interactions constrained and afforded by their bodies. The agent engages in improvised interaction to yield emergent interaction dynamics. The agent remembers its experience, storing the interaction history and utilizing that to inform the creative trajectory of the interaction.” (Davis et al., 2024)

This definition of co-creative AI emphasizes the dynamic nature of meaning construction through feedback and coordination in a social sense-making process, embodied in a dynamic environment, with rich affordances that guide interaction. Importantly, it recognizes the lived experience of a co-creative session and how a nuanced interaction history is developed through time. Enactivism, along with its constituent frameworks of sense-making and participatory sense-making, can be used to study, describe, and quantify co-creative AI.

Sense-Making

Sense-making relates to the ability to interact with the world to build meaning relative to continuing one’s autonomous identity. The interaction with the world ‘casts a web of significance’ () and imbues the environment with meaning through interaction. For example, De Jaegher (2013) writes: “Actions and their consequences constantly shape the underlying processes and modulate autonomy such that intentions, goals, norms, and significance in general change as a result. The significant world of the cognizer is therefore not pre-given but largely enacted, shaped as part of its autonomous activity.” (De Jaegher, 2013). De Jaegher goes on to summarize and define sense-making as: “a cognizer's adaptive regulation of its states and interactions with the world, with respect to the implications for the continuation of its own autonomous identity. In other words, sense-making is concerned acting and interacting, and the concern comes directly from the sense-maker's self-organization under precarious circumstances.” (De Jaegher, 2013). Similarly, Thompson and Stapleton (2009) define sense-making in relation to autonomy: “Sense-making is the interactional and relational side of autonomy…Sense-making is behaviour or conduct in relation to environmental significance and valence, which the organism itself enacts or brings forth on the basis of its autonomy” (Thompson & Stapleton, 2009). Sense-making can be applied to co-creation by investigating meaning construction based on the autonomous activities of the user and AI.

Participatory Sense-Making (PSM)

PSM is a cognitive framework developed by De Jaegher and Di Paolo (2007) within the autopoetic enactive tradition that situates meaning construction as a social activity based on the dynamics of interaction and exchange between multiple agents. These interactions form a ‘co-regulated coupling’ where the autonomy of each interactor is respected and each contributes to the flow of the interaction. The co-regulated coupling develops an independent autonomy that influences each of the contributors, i.e. the dynamics of interaction. The cognitive agent can regulate both the interaction dynamics as well as the content of what is contributed to the interaction. When the agent regulates both interaction dynamics and the content of contributions, participatory sense-making emerges. In this approach, meaning is co-constructed through interaction with other agents and the environment in a way that respects the autonomy of each interactor.

PSM can be applied to co-creation to describe the dynamics of interaction, in particular interaction couplings and shared meaning construction. In co-creation, turn-taking can emerge where each turn can be related to what came before. This relationship can be semantic, visual, or structural correspondences between the content of the turns. PSM emerges in co-creation when multiple successive turns are related to each other and the collaborators are building upon each other’s contributions in a collaborative improvisation, as well as maintaining the dynamics of interaction through feedback and coordination.

Enactive Model of Creativity (EMC)

The EMC is a model for understanding how cognition and perception change dynamically during the creative process (). It proposes the creative process unfolds by discovering and defining directives that guide and constrain interactions. These directives influence what affordances are perceptually available in the environment, which in turn constrains the possible actions that can be executed in the environment. The EMC proposes perceptual logic as a cognitive mechanism that selects relevant affordances from all available affordances. Cognition clamps to a perceptual logic during execution and unclamps during exploration.

Clamped cognition is knowing what to do, when to do it, and executing actions with minimal thought or effort (e.g. walking down a sidewalk). In the model, this is an equilibrium with the awareness rectangle centered in the middle of the cognitive continuum. This is referred to as everyday cognition and relates to the time spent performing activities that do not require extensive thought, such as routines or habits. It would feature a well defined perceptual logic, where the individual can intelligently ‘read’ the environment to detect key pieces of information that inform the task at hand. Here, the individual is in a sensorimotor interaction coupling with the system in the environment, such that it is performing predictably and minimal higher order thinking is involved. The individual is not relying on a mental model of the situation, but rather executing embodied and situated actions in a predictable yet dynamic environment.

Clamped cognition corresponds to the state of flow identified in the psychological literature, which is an optimal state of cognition where skill and challenge are balanced, the individual can focus purely on creative expression, and time ceases to flow linearly (Csit.). An artist, for example, would be in a state of clamped cognition while they were painting in a state of flow. Cognition would be clamped to a perceptual logic that guided perception and action based on perceived affordances in the artwork. To a skilled artist, i.e. one who has developed expert perceptual logic, the artwork itself guides the execution of actions because of the artistic affordances present when a skilled artist looks at their own work in progress. Perceptual logic exists at the local, regional, and global level in an artwork. Changing some local details could perturb the regional or global balance and require correction, thus affording another change. When perceptual logic is violated, cognition can unclamp to explore ways to find balance in the artwork.

Unclamped cognition is interacting with the world to update one’s mental model or explore the environment (e.g. recovering from tripping on the sidewalk). During an unclamp event, the individual may intentionally interact with the environment to update their mental model of the situation at hand. For example, after tripping on the sidewalk, the individual may inspect the upcoming regions of the sidewalk to ensure there are no other cracks or bumps. This inspection could be used to update the mental model of the sidewalk as either smooth or bumpy. Once the mental model has been updated, a new perceptual logic is put into place and the affordances of the environment change dynamically according to the new perceptual logic, i.e. the bumps of sidewalks become more salient in the perceptual process. The mental model is no longer actively used at this point once the perceptual logic is in place and cognition becomes clamped.

Within the unclamped category, there are two ways cognition can unclamp: 1) Functional unclamp, which is regulating interaction with the environment (e.g. exploring the environment from new angles, inspecting details of the environment), and 2) Interactional unclamp, which is disengaging from the interaction, possibly to think, reflect, and update the mental model. Functional unclamping changes the sensory data available to the individual through embodied explorations to gain new perspectives, e.g. moving the head or body to get a different viewpoint. It can also involve perturbing the system in the environment experimentally to determine the relationship between actions and their causal effects in the system in the environment, e.g. stepping slowly on ice and studying the feedback from each step, so as to not fall in a body of water. Interactional unclamp is a metacognitive activity where cognition becomes about cognition. It can occur during periods of pausing and hesitation where the focus of cognition becomes inner processes rather than the sensory data coming into the human system.

In the artist example, an unclamped state would be stepping back from an artwork to get a better view of the whole piece. This would be an example of an interactional unclamp event. The artist could also engage in a functional unclamp event, which could be simulating brush strokes above the canvas to help visualize what they would look like if they were painted. This is an embodied activity where the artist attempts to feel what the brush stroke would feel like if it were painted on the canvas, through both proprioceptive feedback and visual feedback. The artist would then progress through states of clamped cognition, where they were fluidly painting on the canvas, and unclamped cognition, where they were engaged in metacognitive activities, such as evaluating, reflecting, and simulating different outcomes. This cycle of clamped and unclamped cognition is a sense-making process exhibited by the autonomous activity of the individual and characterizes the core of creative sense-making.

Creative Sense-Making Coding Technique

The CSM coding technique maps the cognitive modes of the EMC to the domain of human-AI co-creation, as shown in Table 1.

The CSM coding procedure entails applying a code to every second of interaction in a co-creative session. This can be accomplished with two approaches: 1) manually coding a co-creative session with a specialized video coding platform that enables the continuous sampling of the researcher’s applied code through time, and 2) encoding the four modes of interaction shown in Table X into a co-creative system to automatically code the user and agent’s interactions through time.

In the analytical framework, the values for the interaction mode code are recorded in their raw format (as shown in Figure X left) as well as summed through time to arrive at the CSM curve (as shown in Figure X right). The CSM curve visually illustrates the dynamics of interaction between a user and AI in a co-creative session. The data the CSM produces can be used to count the overall number of code applications, model the interaction dynamics with the CSM curve, statistically model the CSM curve to detect trends in the dataset, and model turn-taking.

Figure X: The raw coded values (interaction dynamic data) are on the left. The summed values are on the right, which is the CSM curve. The data was captured during a five-minute co-creative drawing session with the AI Drawing Partner, which is described in Section X.

Manually Coding CSM: Codix Video Coding Application

Manually coding the video from a co-creative AI session using the CSM technique involves applying continuous codes to every second of a video. This entails a researcher reviewing a video and rapidly switching between codes as the video plays, and the system would record and graph those codes in real time. In practice, this can mean designing a video coding application that has a video upload feature and a range slider UI element the user can manipulate to change the value of the code being applied. The system would then sample from that range slider every second as the video plays. In this manner, the raw interaction mode counts are recorded, which are then summed to produce the CSM curve which is used for analysis.

For example, the freely available video analytics platform called Codix, as shown in Figure X, facilitates coding with the CSM approach. In this software application, the user uploads a video of a co-creative session, and plays the video to begin the coding process. The user selects one of the interaction mode codes from a slider that ranges from 1 to -1. The corresponding code is highlighted in the coding scheme to indicate what code is selected. The coding scheme is editable to enable users to add their own behavioral markers for the creative domain they are studying. As the video plays, the range slider is continuously sampled and the value is added dynamically to the graph below the video. Pausing the video stops the data collection process. The video can be slowed down to enable a more thorough analysis. The user watches the video and uses the arrow keys to control the slider in real time to code one of the participant’s interactions at a time. The user can pause and scrub the video to facilitate their analysis, and the system scrubs to that point in time in the dataset to enable re-coding video snippets.

Figure X: The Codix video coding application user interface. The video is at the top left. Below the video is the graph of the coded values. The code applicator slider, which is utilized by the user to select which code is currently being applied, is to the right of the graph. The editable coding scheme is located at the top right, with highlighted values for the current code. Below the code graph, the user can add an event that appears on the interactive event timeline to the right.

There is also an event-based coding panel where the user can add in events describing key qualitative data points that occurred in the co-creation, noting their duration, timestamp, label, and category. The categories can be edited to be relevant to the target creative domain in the settings panel, with insertions and deletions possible through the interface. After the initial coding process, the user can turn coding off in the settings menu, and rewatch the video for an event-based analysis that will not disrupt the interaction mode code dataset. When creating an event, the timestamp information is auto populated when the user pauses the video. The user then adds a label and optionally adds a duration, note describing the event, and chooses a category descriptive of the event. This category can be a category like those found in Grounded Theory, or a qualitative theme, such as those found in Thematic Analysis. The events are then visualized in an interactive timeline that displays the associated data when the user hovers over the event. The events can be downloaded as a JPG of the event timeline or a CSV of the event data.

The system performs a count of the overall codes applied, and calculates and displays the CSM curve. Each of the datasets are downloadable individually or collectively by pressing the save icon. The save feature provides a report combining the visualizations used in the interface, as shown in Figure X, and CSV files for the coded values and the CSM curve.

Figure X: The report generated by the Codix video analytics platform for a co-creative drawing session with the AI Drawing Partner. The elements include the sessionID and filename, interaction mode code graph (top-left), CSM curve (bottom-left), code count chart (top-right), and timeline with qualitative events (bottom-right).

Automatically Coding CSM: AI Drawing Partner

The second approach to perform the CSM coding procedure is to encode the interaction modes into a co-creative system to automatically code the interactions of the user and the agent. One such example of this type of quantified co-creative system is the AI Drawing Partner. The AI Drawing Partner is a co-creative drawing agent and research platform to model co-creation. It is both a co-creative drawing agent that analyzes the user’s contribution and feedback and contributes its own drawn content and images to a shared canvas, as well as a quantified system that models and visualizes the interaction dynamics of co-creation occuring on the platform.

Figure X: AI Drawing Partner user interface. The settings and system functions are in the top panel. The panel on the left is the paint menu where the user can select layers, line styles, fill, smudge, and erase. The chat panel is at the bottom left, and it features a dropdown selector for objects in the system’s database, and a chat input for natural language input. The voting buttons are on the right. The virtual character is in the top right.

The AI Drawing Partner, as shown in Figure X, can mimic the user’s abstract line input (e.g. scale, rotate, translate, sketch over the user’s line input), recognize the user’s sketched objects, draw semantically similar objects, learn new objects based on user demonstration, generate images from a text prompt, and stylize the user’s sketch. There are positive and negative feedback buttons to communicate the user’s artistic preferences to the AI system. This feedback influences what drawing algorithms are selected to respond to the user. The AI communicates to the user through a speech bubble animation, which the AI uses to explain its actions, provide instructions to the user, and ideate sketches. After a session is complete, the user can access the quantified metrics of the co-creative session, as shown in Figure X, including cognitive dynamics, interaction dynamics, collaboration dynamics, and domain dynamics.

Figure X: The statistics panel of the AI Drawing Partner user interface.

The AI Drawing Partner uses the CSM to code interactions of the user and agent throughout the course of a co-creative session. Figure X shows the quantified results from a co-creative drawing session with the AI Drawing Partner. The left side shows the raw coded values and their summed count at the end of the session. This data can be used to group participants into different user types based on the occurrence of the various code counts. The combined user and agent profile could be used to characterize different types of co-creative sessions. The CSM curve is on the right side of Figure X. This is the running cumulative sum of the interaction mode code values. It visually depicts the interaction dynamics of a co-creative session. The features of each curve can be analyzed to determine factors, such as the slope and r-squared value. Further analysis can characterize upwards and downwards trends.

Figure X: The cognitive dynamics from the AI Drawing Partner user interface. The interaction mode graph and counts are on the left. The creative sense-making curve and features are on the right. The session data is from the co-creation depicted in the user interface description in Figure X.

When the CSM curve in Figure 2 (right) rises, the collaborator is either communicating or manipulating the interface. When the curve falls, the collaborator is drawing. Flat parts of the curve are waiting. The CSM curve can be used to identify interaction patterns and trends

Classifying Interaction Trends & Traces

The CSM curve is a continuous linear dataset, which enables statistical modeling. One approach for applying statistical modeling to the dataset is to apply stock market technical analysis to the creative sense-making curves. This technique detects buy, sell, and hold signals. These signals are translated to regulate (communicate & manipulate interface), execute fluid drawing action, and wait. These trends are summed through time and visualized to compare the interaction dynamics of different co-creative sessions. For example, in Figure X, the interaction trends are summed for an abstract drawing session and a representational drawing session. The first author conducted two five minute drawings with the AI Drawing Partner with different creative intentions, namely abstract and representational. In practice, the representational session consisted of requesting the AI to draw objects and incorporating them into the drawing. The abstract session was more reactive to what the AI was drawing at the moment.

The representational session features periods of regulation interspersed with drawing, whereas the abstract session is mostly cycles of drawing and waiting. The regulation from the representational sessions are from the user requesting the system to draw sketches, teaching the system new objects, and providing feedback. These trends can be summed to arrive at a total amount of time spent engaged in each type of activity. This can contribute to a co-creative profile of the user, where different data groups can be clustered.

Figure X: The time-series analysis applied to the CSM curves of the user in two co-creative drawing sessions the first author conducted with the system: 1) representational (e.g. including objects and potentially a theme), and 2) abstract (e.g. no objects requested). Each trend is summed through time and visualized on the chart.

Turn Taking Rhythm

During co-creation, a rhythm is established within each turn and between turns. Within a turn, there is time spent: waiting for the collaborator to take their turn, pausing and hesitating to think, drawing, communicating, and manipulating the interface. With a quantified approach, these factors can be recorded and visualized, as is shown in Figure X. There are two turn-taking datasets presented in Figure X: abstract and representational, which are the same sessions from Figure X. In Figure X, we see there were more turns in the abstract session than in the representational session. In the representational session, there were two longer turns that could have accounted for some of the difference between the turns in the sessions. In the abstract session, the AI draws, then sometimes the user communicates, which is providing feedback, and then the user pauses, then draws. In the representational session, however, the turns are more intermixed with communication events happening at irregular intervals, which were most likely the user requesting the system to draw sketches.

Figure X: Turn rhythm trends for abstract and representational co-creative drawing sessions conducted by the first author with the AI Drawing Partner.

With this type of visualization, it is possible to begin to understand the nuances of turn-taking, such as the ordering of events within a turn, the relationship between turns, and general features such as longest and shortest turn. Both the AI and the user are represented in this format, as waiting for the user is drawing for the AI. The visualization depicts the relative length of drawing time for the user and AI. This data can be paired with qualitative data to arrive at a more complete picture of what happened during a particular turn, or patterns among turns. The turn trend data can serve as a guide for the qualitative analysis to identify key points in time that might be of interest to investigate further. For example, what happened during turn 2 and turn 4 of the representational session to make them so long? There was communication in both turns, so it is known from the diagram that the user communicated to the AI, but it is not clear what that communication was from the visualization. This could be remedied by examining the video recording or screen capture of the co-creative session and going to the point in time where the turn occurred. To aid this process, the co-creative system could provide a timestamped record of all turns.

Discussion

When sense-making and participatory sense-making are applied to a creative pursuit, creative sense-making emerges, where directives that guide and constrain interaction emerge and meaning is gradually constructed through interaction with the environment and others within that environment. CSM can be characterized by fluctuations of clamped and unclamped cognition. These fluctuations can be quantified to characterize CSM processes. This can be accomplished through a cognitive interaction coding scheme based on the Enactive Model of Creativity’s modes of cognition mapped to modes of interaction in co-creative AI. The coding scheme can be applied manually through coding the video recording of a co-creative session or automatically by enabling a co-creative system to automatically detect the modes of interaction. Once quantified, interaction dynamics can be statistically modeled and analyzed with techniques such as stock market technical analysis.

The cognitive analysis of the CSM continuously codes the cognitive and interaction mode of the user and agent. This produces the creative sense-making curve, which visually depicts interaction trends and patterns. Analyzing the trends in this curve can help characterize and describe co-creation and compare co-creative behavior between experimental sessions. A more fine-grained analysis can be conducted of the turns and the turn-taking rhythm to further understand the nature of co-creation between and human and an AI.

CSM enables a theoretically grounded methodology with which co-creativity researchers can have common representations of co-creation, data collection formats, modeling techniques, and visualization techniques. The interaction mode code counts provide a profile of the user and agent in a co-creative session. The CSM curve characterizes the dynamics of interactions. The stock market technical analysis demonstrates interaction trends. The turn-taking rhythm analysis reveals temporal relationships within and between turns. These techniques are domain independent and can be applied to any co-creative domain, as exemplified in the list of CSM coding applications in (Davis, 2024).

A quantified approach to human-AI co-creation can advance the field of co-creative AI by providing a domain independent model of co-creation that can be compared between and across creative domains. The CSM curve represents a common model and visualization of interaction dynamics that can be used to compare the interaction dynamics of co-creative systems in different creative domains and with vastly different creative outputs. The artifacts co-created during interaction can be characterized with this CSM curve, such that certain styles might have distinct looking curves.

Conclusions

Creative sense-making was described as a theoretical framework for describing co-creation and quantifying the sense-making and participatory sense-making occurring during co-creation, i.e. the interaction dynamics of co-creation. The framework includes a methodology to code interaction modes continuously through time to arrive at a CSM curve that visually depicts the interaction trends of a co-creative session. This curve can be further analyzed using statistical methods, such as stock market technical analysis. The turn-taking rhythm can be coded and visualized to provide a model of turn behavior and inter-turn relationships. The CSM coding procedure can be applied manually using a video coding platform, such as Codix, which is a video coding platform described in the paper. This web application enables the CSM coding procedure on videos of co-creative interactions. The CSM coding procedure can also be encoded into a co-creative system to enable the automatic coding of interactions through time. Both techniques produce data that can be compared between studies, and across creative domains. The CSM can provide a shared methodology and theoretical framework to advance the field of co-creative AI.

References

Andy Clark, & David Chalmers. (1998). The extended mind. Analysis, 58(1), 7–19.

Nicholas Davis. (2013). Human-computer co-creativity: Blending human and computational creativity. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE).

Nicholas Davis. (2024). Creative Sense-Making. In Artificial Intelligence, Co-Creation and Creativity: The New Frontier for Innovation (pp. 45–XX). Springer.

Nicholas Davis, et al. (2024). The Five Pillars of Enaction as a Theoretical Framework for Co-Creative AI. In Proceedings of ACM Creativity & Cognition.

Hanne De Jaegher. (2013). Embodiment and sense-making in autism. Frontiers in Integrative Neuroscience, 7, 15.

Hanne De Jaegher, & Ezequiel Di Paolo. (2007). Participatory sense-making: An enactive approach to social cognition. Phenomenology and the Cognitive Sciences, 6(4), 485–507.

James J. Gibson. (1979). The ecological approach to visual perception. Houghton Mifflin.

Matthew Guzdial, & Mark Riedl. (2019). Toward interaction frameworks for studying co-creative AI. In Proceedings of the International Conference on Computational Creativity (ICCC).

Anna Kantosalo, et al. (2020). Modelling modes, styles, and strategies in human-AI co-creativity. In Proceedings of the International Conference on Computational Creativity (ICCC).

Maurice Merleau-Ponty. (1962). Phenomenology of perception. Routledge.

Alva Noë. (2004). Action in perception. MIT Press.

Ezequiel Di Paolo, Elena Clare Cuffari, & Hanne De Jaegher. (2018). Linguistic bodies: The continuity between life and language. MIT Press.

Mihaly Csikszentmihalyi. (1990). Flow: The psychology of optimal experience. Harper & Row.

Evan Thompson, & Moggi Stapleton. (2009). Making sense of sense-making: Reflections on enactive and extended mind theories. Topoi, 28(1), 23–30.

Francisco Varela, Evan Thompson, & Eleanor Rosch. (1991). The embodied mind: Cognitive science and human experience. MIT Press.

Dave Ward, Deborah Silverman, & Mario Villalobos. (2017). Introduction: The varieties of enactivism. Topoi, 36(3), 365–375.

Tong Wu, et al. (2021). Human-AI co-creation model for creative collaboration. In Proceedings of the International Conference on Human-Computer Interaction.

Jeba Rezwana. (2022). Designing co-creative systems: The COFI framework for interaction design. In Proceedings of the International Conference on Computational Creativity (ICCC).

Google Sites

Report abuse