See all authors and affiliationsScience Advances 19 Aug 2020:
Vol. 6, no. 34, eabb4159
Humans learn from their own trial-and-error experience and observing others. However, it remains unknown how brain circuits compute expected values when direct learning and social learning coexist in uncertain environments. Using a multiplayer reward learning paradigm with 185 participants (39 being scanned) in real time, we observed that individuals succumbed to the group when confronted with dissenting information but observing confirming information increased their confidence. Leveraging computational modeling and functional magnetic resonance imaging, we tracked direct valuation through experience and vicarious valuation through observation and their dissociable, but interacting neural representations in the ventromedial prefrontal cortex and the anterior cingulate cortex, respectively. Their functional coupling with the right temporoparietal junction representing instantaneous social information instantiated a hitherto uncharacterized social prediction error, rather than a reward prediction error, in the putamen. These findings suggest that an integrated network involving the brain’s reward hub and social hub supports social influence in human decision-making.
Human decision-making is affected by direct experiential learning and social observational learning. This concerns both big and small decisions alike: In addition to our own experience and expectation, we care about what our family and friends think of which major we choose in college, and we also monitor other peoples’ choices at the lunch counter to obtain some guidance for our own menu selection—a phenomenon known as social influence. Classic behavioral studies have established a systematic experimental paradigm of assessing social influence (1), and neuroimaging studies have recently attempted to unravel their neurobiological underpinnings (2, 3). However, social influence and subsequent social learning (4) have rarely been investigated in conjunction with direct learning.
Direct learning has been characterized in detail with reinforcement learning (RL) (5) that describes action selection as a function of valuation, which is updated through a reward prediction error (RPE) as a teaching signal (5, 6). While social learning has been modeled by similar mechanisms insofar as it simulates vicarious valuation processes of observed others (7, 8), most studies only involved one observed individual, and paradigms and corresponding computational models have not adequately addressed the aggregation of multiple social partners.
Despite the computational distinction between direct learning (with experiential reward) and social learning (with vicarious reward), neuroimaging studies remain equivocal about the involved brain networks: Are the neural circuits recruited for social learning similar to those for direct learning? In direct learning, a plethora of human functional magnetic resonance imaging (fMRI) studies have implicated a network involving the ventromedial prefrontal cortex (vmPFC) that represents individuals’ own valuation (9) and the ventral striatum (VS)/nucleus accumbens (NAcc) that encodes the RPE (6). These findings mirror neurophysiological recordings in nonhuman primates showing the involvement of the orbitofrontal cortex and the striatum in direct reward experience (10). Turning to social learning, evidence from human neuroimaging studies have suggested similar neuronal patterns of experience-derived and observation-derived valuation, showing that the vmPFC processes value irrespective of being delivered to oneself or others (7, 11). However, recent studies in both human (12, 13) and nonhuman primates (14) have suggested cortical contributions from the anterior cingulate cortex (ACC) that specifically tracks rewards allocated to others. Although these findings suggest that direct learning and social learning are, in part, instantiated in dissociable brain networks, only very few studies have investigated how these brain networks interact when direct learning and social learning coexist in an uncertain environment (15), and none of them involved groups larger than two individuals.
Here, we investigate the interaction of direct learning and social learning at behavioral, computational, and neural levels. We hypothesize that individuals’ direct valuation is computed via RL and has its neural underpinnings in the interplay between the vmPFC and the NAcc, whereas individuals’ vicarious valuation is updated by observing their social partners’ performance and is encoded in the ACC. In addition, we hypothesize that instantaneous socially based information has its basis in the right temporoparietal junction (rTPJ) that encodes others’ intentions necessary for choices in social contexts (12, 16, 17). To test these hypotheses, we designed a multistage group decision-making task in which instantaneous social influence was directly measured as a response to the revelation of the group’s decision in real time. By further providing reward outcomes to all individuals, we enabled participants to learn directly from their own experience and vicariously from observing others. Our computational model separately updates direct and vicarious learning, but they jointly predict individuals’ decisions. Using model-based fMRI analyses, we investigate crucial decision variables derived from the model, and through connectivity analyses, we demonstrate how different brain regions involved in direct and social learning interact and integrate social information into individuals’ valuation and action selection. In addition, confidence was measured both before and after receiving social information, as confidence may modulate individuals’ choices during decision-making (3, 18).
Our data and model suggest that instantaneous social information alters both choice and confidence. After receiving the outcome, experience-derived values and observation-derived values entail comparable contributions to inform future decisions but are distinctively encoded in the vmPFC and the ACC. We further identify an interaction of two brain networks that separately process reward information and social information, and their functional coupling substantiates an RPE and a social prediction error (SPE) as teaching signals for direct learning and social learning.
Participants (N = 185) in groups of five performed the social influence task, of which 39 were scanned with the MRI scanner. The task design used a multiphase paradigm, enabling us to tease apart every crucial behavior under social influence (Fig. 1A). Participants began each trial with their initial choice (Choice 1) between two abstract fractals with complementary reward probabilities (70 and 30%), followed by their first postdecision bet (Bet 1, an incentivized confidence rating from 1 to 3) (19). After sequentially uncovering the other players’ first decisions in the sequential order of participants’ subjective preference (i.e., participants decided on whose choice to see in the first place and the second place, followed by the remaining two choices), participants had the opportunity to adjust their choice (Choice 2) and bet (Bet 2). The final choice and bet were then multiplied to determine the outcome on that trial (e.g., 3 × 20 = 60 cents). Participants’ actual choices were communicated in real time to every other participant via intranet connections, thus maintaining a high ecological validity. The core of this paradigm was a probabilistic reversal learning (PRL) task (fig. S1B) (20). This PRL implementation required participants to learn and continuously relearn action-outcome associations, thus creating enough uncertainty such that group decisions were likely to be taken into account for behavioral adjustments in second decisions (before outcome delivery; referred to as instantaneous social influence) and for making future decisions on the next trial by observing others’ performance (after outcome delivery; referred to as social learning) together with participants’ own valuation process (referred to as direct learning). These dynamically evolving group decisions also allowed us to parametrically test the effect of group consensus, which moved beyond using only one social partner or an averaged group opinion (2, 12). Although participants were able to gain full action-outcome association at the single-trial level, across trials, participants may acquire additional valuation information by observing others, given the multiple reversal nature of the PRL paradigm. In addition, participants were aware that there was neither cooperation nor competition (see Materials and Methods).