How Unbecoming of You: Gender Biases in Perceptions of Ridesharing Performance


RSM MSc BIM CCDC – Group 11

This blog post aims to provide a review of the research paper “How Unbecoming of You: Gender Biases in Perceptions of Ridesharing Performance” published by the researchers Greenwood, Adjerid, and Angst in 2018. The post concludes with a business case of Uber, which relates closely to the paper’s topic of the perceived gender biases in ridesharing performance.

Paper Review

The main objective of this research is to unravel significant biases that exist when consumers place a review online. More specifically, the researchers decided to focus on the gender biases that might take place on ridesharing platforms. While aspects of the rating process have a role to play, the characteristics of the rater and ratee have been found to have an effect on the willingness to transact. That is, the ex-ante evaluation of quality, meaning how an individual assesses a product or service before actually consuming it. These ex-ante quality perceptions were examined against the post transaction perceptions of quality. A few papers have previously addressed gender as a possible factor that can affect service quality evaluation. However, this paper delivers novel and valuable insights by taking into consideration gender as a plausible factor that could affect user post-consumption evaluation.

The researchers measured the perception of quality both before and after the service and developed three hypotheses to test, namely;

  • (H1) “Female gender status will correlate with lower ex ante perceived quality of service, as compared with men, all else equal.”
  • (H2) “Female drivers will be penalized to a greater degree, as compared with male drivers, for performance shortfalls, all else equal.”
  • (H3) “Female drivers will be penalized to a greater degree, as compared with male drivers, for performance shortfalls when performing highly gendered tasks, all else equal.”

To test the hypotheses, the paper uses an experiment with a 2 (gender) x 2 (race) x 2 (Historical Quality) x 2 (Experience Quality), between-subjects research design. The researchers informed the participants that they represented a new ride sharing service, called Agile Rides. Agile Rides was in the process of being launched and participants’ assistance was required to understand what makes a good rider experience, bringing the experiment closer to a real world setting.

One of the main findings of the paper was that with historic quality being available, gender bias does not penalize women drivers before the service is rendered in a ridesharing context (H1). However, if the service provided by a woman is of a lower quality, worse ratings accrue for females relative to males with the similar performance (H2). Furthermore, when the tasks were considered highly gendered (either feminine or masculine), these penalties were intensified when performed by female drivers than by male drivers with the same performance (H3).

Strenghts & Weaknesses

Although there is no question regarding the relevancy of the paper, multiple strengths and weaknesses do exist. First, one strength of the paper is that gender and quality manipulations are extensively tested in the pre-studies. This allows the researchers to make accurate comparisons of perceived quality before and after the experiment has taken place. Second, the paper delivers high practical implications for services that work with online rating systems. These services can now identify which steps they must take in order to limit how gender bias is affecting the perceived quality of services offered.

However, one weakness of the paper is that the researchers did not account for previous ride sharing experiences of participants. These experiences, either positively or negatively, could have influenced their quality perceptions. One suggestion would be that the researchers should inquire about the respondents’ previous ride sharing experiences. By doing so, the researchers could examine and compare the responses of the respondents with positive, negative, or no previous ride sharing experiences, in order to find out if pre-ride sharing experiences could yield different results. Second, participants were asked to imagine a hypothetical situation, which creates the risk that riders’ behaviour in real life could differ from what they have indicated. According to Ajzen et al. (2004), bias in hypothetical situations exist because individuals imagine that they will act according to social norms and expectations, which is not always the case in real life. The results in a real world setting could therefore differ from the results found in the research. A possible solution to this problem is to implement Virtual Reality (VR) when measuring the quality perceptions of the participants. Instead of only relying on text to imagine a situation, participants can now also experience it with visuals. Situations closer to real life settings can be created and bias can be reduced.

Business Case Uber

An example of a company that outsources driver ratings and has experienced gender bias in their evaluation system is Uber.

Generally, after a ride, a passenger is asked, through an Uber app, to rate the driver anonymously using “1- to 5-star scale” (Rosenblat et al., 2016, p. 3). Leveraging anonymous consumer-sourced ratings, Uber outsources driver evaluation to consumers. Nevertheless, as consumers enter their ratings into the system, algorithms also record the consumers’ implicit biases. A case study on Uber reveals that driver ratings are highly likely to be biased by factors such as race, ethnicity and gender (Rosenblat et al., 2016). In the world, there are laws that protects consumers from direct discrimination such as The Equality Act 2010. However, there is currently no law that handles indirect bias like those generated from consumer-sourced ratings. As such, the authors of the Uber case study propose following ten interventions to limit bias in the consumer-sourced ratings (Rosenblat et al., 2016).

  • First, it is important to track consumer-sourced ratings which enables identification of potential driver bias patterns.
  • Second, it is equally important to disclose the identified patterns to the public in order to propel solutions within Uber. 
  • Third, ratings should be validated in conjunction with behavioural data. For example, if a driver receives a low rating, the speed with which the driver drove should also be assessed in order to justify the low performance.
  • Fourth, each rating should have a different weight to account for potential biased raters, which are found upon statistics. 
  • Fifth, Uber should increase feedback criteria for consumers who provide low ratings, for example elaboration on certain dimensions they were dissatisfied with. 
  • Sixth way to eliminate consumer-sourced biases is to keep the consumer-sourced ratings only for internal uses rather than driver evaluation. 
  • Seventh, Uber can also increase in-person assessments of low-rated drivers. 
  • Eight suggestion is about opening the platform fully to both drivers and consumers. With an open policy, both parties would be able to join the platform, get to know each other and receive the option to select or approve upon each other requests.

The last two interventions prompt to alter the legal aspect of ride-sharing platforms: 

  • Ninth plausible solution could be to turn self-employed drivers into law-protected employees.  
  • Finally, the authors suggest legal bodies to “lower the pleading requirements for claims” that are brought against ride-sharing platforms (Rosenblat et al., 2016, p. 16).

In conclusion, the paper presents novel findings that serve to inform ridesharing platforms, such as Uber, about biases in their evaluation services. Furthermore, this blog post provides ridesharing platforms with ten interventions to limit possible biases in their consumer-sourced ratings.

References

Ajzen, I., Brown, T. C., and Carvajal, F. 2004. “Explaining the Discrepancy Between Intentions and Actions: The case of hypothetical bias in contingent valuation,” Personality and Social Psychology Bulletin (30:9), pp. 1108-1121.

Greenwood, B. N., Adjerid, I., & Angst, C. M. (2017). How Unbecoming of You: Gender Biases in Perceptions of Ridesharing Performance.

Orwellian Social Credit system: myth or reality?


Black Mirror

Everyone who has been keeping up with the offering of Netflix has heard of Black Mirror, the series about dystopian worlds becoming reality. In one of the episodes writer Charlie Booker depicts a world in which every citizen has a social score that everyone can vote on when getting into contact with the person in question. The main character of the episode is at a certain point in the episode denied access to a flight due to her low social score.  A scary thought, but as it turns out very much reality. The Chinese government intends to implement a similar system as portrayed in Black Mirror in which people are assigned a social credit. Main difference is that this score is attributed by the government through big data, not by fellow ‘victims’. In the coming year, 2020, the social credit system is bound to kick off. While the examples of a poor social credit provided in the Orwellian Black Mirror episode approach extremes, some of the scenes will turn out to be real consequences in China. For example, by the end of 2018 more than five million citizens of China have already been denied access to high-speed rail tickets due to having been placed on a blacklist due to debt  (Needham, 2019). Some other implications for citizens once the system initiates are being unable to find a job in civil service, journalism and legal fields or having your children being denied access to high-paying private schools (Botsman, 2017).

Sesame Credit

What if I told you that this social system has been a reality for over four years already? That’s right. Alibaba, the Chinese multinational giant in e-commerce and other sectors, has assigned its customers with a social credit score, commonly referred to as Sesame Credit (Jefferson, 2018). Alibaba is known to have close affiliates with the Chinese government and the Sesame Credit is partially a trial version of the social credit system about to be introduced (Financial Times, 2017).

So what is the Sesame Credit and how does it work? Alibaba collects a ton of data on their customers. Given that they are active in insurance, loans, e-commerce and even dating, it is evident that they have a lot to analyze. The credit system uses data on more than 300 million people and 37 million businesses (Alibaba Group, 2015). To add to this, their ties with the Chinese government provide them with access to official identities, financial records and even messages of Chinese WhatsApp alternative WeChat (Huang, 2017). All this data is gathered by Alibaba and then analyzed to come to a Sesame Credit, which can be interpreted as an indication of someone’s trustworthiness. While the exact algorithm they use to determine a person’s Sesame Credit is unknown, it is known that the heaps and heaps of data collected all amount to a different rating in five categories. Namely, credit history, fulfillment capacity (ability to live up to contract terms etc.), personal characteristics, behavior & preferences and lastly interpersonal relationships (i.e. your friends). Bound together, you get yourself your very own Sesame Credit.

Applications

Now, what to do with your Sesame Credit is a natural next question. A main difference when comparing the Sesame Credit to the approaching Social Credit system by the Chinese government is that the Sesame Credit is about rewarding trustworthy people rather than punishing those that do not have a high rating. Some ways in which the credit score has rewarded customers are when applying for a loan with Ant Financial, a daughter company of Alibaba or when trying to book a night at a hotel. The merit to a high social score there is not having to pay up front due to the high trustworthiness. Baihe.com, a Chinese dating site, has even started to allow users to add their Sesame Credit to their profile as a way to provide better dating opportunities for users (Hatton, 2015). These are just some of the more obvious applications for the Sesame Credit and how it creates value for people with a high rating.

A remaining question is the value for Alibaba itself. Other than being a nice perk to hand out to customers, a first glance at the credit system raises the question as to what the use is for Alibaba. The reason why Sesame Credit or any social credit in China has a lot of purpose for these entities is the way it points out desired behavior. By encouraging people to behave by making them aware of the fact that every move is being monitored and therefore counts, people will start to behave more desirably in order to retain their high Sesame Credit and, consequentially, the rewards that come with it.

Basically, the Sesame Credit seems to be a win-win situation for those people that are, in the most broad definition of the word, decent and Alibaba. By evoking good behavior in people so that their Sesame Credit becomes an accurate reflection of their proper conduct, Alibaba boosts the average trustworthiness of their customers as well as providing their model citizens with proper rewards. Naturally, there are questions as to the ethics of monitoring every step customers take as well as analyzing them and adding a trustworthiness tag to a human being, but all ethical issues aside, the business model seems to merit both the user and the company. It works, and given that Sesame Credit was a trial indirectly executed by the Chinese government, we can look forward to the implementation of ‘the real deal’, the actual social system that will go live in 2020.

Conclusion

In conclusion, the episode Nosedive from Black Mirror has opened the eyes to many westerners which regard to the social credit system that is about to be introduced nationwide in China. Despite this more recent revelation, the Sesame Credit, a predecessor to the big fish by the Chinese government has been up and running for four years already and is deemed a success. Customers get assigned with a trustworthiness score and in return get access to many advantages such as discounts or not having to pay deposits at hotels. If this system will work in a punishing fashion has yet to be discovered, but it will most certainly be an interesting development to keep an eye out for.

References

Alibaba Group (2015) Ant Financial Unveils China’s First Credit-Scoring System Using Online Data. Available at: https://www.alibabagroup.com/en/news/article?news=p150128 (Accessed: 7 March 2019).

Botsman, R. (2017) Big data meets Big Brother as China moves to rate its citizens. Available at: https://www.wired.co.uk/article/chinese-government-social-credit-score-privacy-invasion (Accessed: 7 March 2019).

Financial Times (2017) China changes tack on ‘social credit’ scheme plan. Available at: https://www.ft.com/content/f772a9ce-60c4-11e7-91a7-502f7ee26895?mc_cid=9068154611 (Accessed: 7 March 2019).

Hatton, C. (2015) China ‘social credit’: Beijing sets up huge system. Available at: https://www.bbc.com/news/world-asia-china-34592186 (Accessed: 7 March 2019).

Huang, P. (2017) WeChat Confirms: It Shares Just About All Private Data With the Chinese Regime. Available at: https://www.theepochtimes.com/wechat-confirms-it-gives-just-about-all-private-user-data-to-the-chinese-regime_2296960.html (Accessed: 7 March 2019).

Jefferson, E. (2018) No, China isn’t Black Mirror – social credit scores are more complex and sinister than that. Available at: https://www.newstatesman.com/world/asia/2018/04/no-china-isn-t-black-mirror-social-credit-scores-are-more-complex-and-sinister (Accessed: 7 March 2019).

Needham, K. (2019) China: Big Data watches millions during Chinese New Year. Available at: https://www.smh.com.au/world/asia/millions-are-on-the-move-in-china-and-big-data-is-watching-20190204-p50vlf.html (Accessed: 7 March 2019).

Competing for Attention: An Empirical Study of Online Reviewers’ Strategic Behavior


This is a review of the paper “Competing for Attention: An Empirical Study of Online Reviewers’ Strategic Behavior’ written by Shen, Hu & Ulmer (2015).

Introduction
In 2007, a study by Deloitte found that 62% of consumers read consumer-written online product reviews, and among these consumers, 82% stated that their purchase decisions were directly influenced by online reviews. Shen, Hu & Ulmer (2015) argue that these percentages would be higher if the study were to be replicated today, as consumers increasingly rely on online opinions and experiences shared by consumers when deciding what product to purchase. As such, it is important for companies to understand what incentivizes online reviewers to actually write reviews and what the effects of incentives are on the content of their reviews (Shen et al., 2015).

The authors argue that there is a large body of literature on online product reviews, but that this existing body of literature has failed to look at how online reviewers are incentivized to write reviews (Shen et al., 2015). This includes studies such as by Basuroy et al. (2003), who looked at numerical aspects of reviews and studies such as by Godes & Silva (2012), who looked at the evolution of review ratings. However, the authors note that a large part of existing research simply assumes that reviews are written for the same motives that offline consumers have when they provide word-of-mouth reviews (Dichter, 1966).

With this gap in mind, the authors drew from literature in other contexts, such as motivation for voluntary contributions in open source software and firm-hosted online forums. Building on this literature, the authors propose that gaining online reputation and attention from other consumers is an important motivation for their contribution to review systems (Shen et al., 2015). In order to explore this, the paper “empirically investigates how incentives such as reputation and attention affect online reviewers’ behaviours” (Shen et al., 2015, p. 684).

The Methodology
In order to conduct this empirical investigation, the authors use real-life data of online reviews of books and electronics, gathered from Amazon and Barnes & Noble (Shen et al., 2015). The data was collected on a daily basis and allows for a comparison both across product categories as well as across different review systems (Shen et al., 2015). Amazon and Barnes & Noble were selected because they are the two largest online book retailers and have two distinctly different review environments (Shen et al., 2015). Whereas Amazon ranks reviewers based on their contribution, allowing the reviewers to build up a reputation and consistently gain future attention, Barnes & Noble does not offer any of this (Shen et al., 2015).

The authors gathered a sample that includes all books released between September and October 2010, resulting in a sample of 1,751 books with 10,195 reviews (Shen et al., 2015, p. 685). Additionally, the authors randomly selected 500 electronic products on Amazon in order to allow for cross category comparison with the findings resulting from the analysis of the book reviews, allowing the authors to generalize their findings (Shen et al., 2015).

Based on this data, the authors look at two review mechanisms at two levels, namely the product level and the review rating level.

At the product level, the authors study how popularity (determined by the sales volume of the product) and crowdedness (measured by the number of preexisting reviews for the product) affect a reviewer’s decision on whether to write a review for a product (Shen et al., 2015). Additionally, the model controls for potential reviewers (in order to control for the possibility that an increasing number of daily reviews is due to an increasing number of potential reviewers over time) and the effect of time, in order to control for the issue that reviewers might lose interesting in writing reviews for products that have been out for a while (Shen et al., 2015). The resulting model for the product level can be found below:

https://lh3.googleusercontent.com/wUcjn8yBi79VRix7izp_Li-WSi09cca26nWw4Sx02Vopa8UlFegJ1vpLX9VsGaOBhvnq-tol3oByuZa-EercwfQ_aLaNsplYBvOhUWLfgNG-BgloPMyaBSQ-vuRME-waI37QirgGyr4

At the review rating level, the authors study how reputation status affects reviewer’s decisions on whether to differentiate from the current consensus (Shen et al., 2015). They look at how a target rating deviates from the average rating, indicating how differentiated the rating is (Shen et al., 2015).

Main Results
The main results stemming from this study are that online reviewers appear to behave differently when they have strong incentives to gain attention and enhance their online reputation (Shen et al., 2015). Looking at popularity, online reviewers tend to select popular books to review, as this would allow them to receive more attention (Shen et al., 2015). As for the crowdedness, it was found that fewer reviewers will choose to review a book if the review segment becomes crowded, indicating that reviewers tend to avoid such spaces as they would have to compete for attention (Shen et al., 2015).

Next to this, differences in the results between Amazon and Barnes & Noble indicate that in online review environments with a reviewer ranking system, reviewers are more strategic and post more differentiated ratings to capture attention, doing so to improve their online reputation (Shen et al., 2015). In turn, this reviewer ranking system intensifies the competition for attention among reviewers. Next to these main findings, the authors ran some additional analyses to further understand online reviewers behaviours (Shen et al., 2015).

Running the same analyses on the electronic products dataset yielded consistent results. As such, the authors argue that their findings are robust (Shen et al., 2015).

Adding onto their results, the authors argue that with a reviewer ranking system through which reviewers can build up their reputation, opportunities arise for reviewers to monetize their online reputation by receiving free products, travel invitations and even job offers (Coster, 2006).

Strength & Managerial Implications
The main strength of this paper is in its use of real-life cases and the practical implications for online review systems and companies that make use of these review systems.

As reviewers respond strategically to incentives such as a quantified online reputation, this can be used to motivate reviewers consistently (Shen et al., 2015). An example of this is TripAdvisor’s profiles and contributor badges (as seen in the picture to the left).

Additionally, as reviewers are more likely to write a review for popular but uncrowded products, companies can make use of this by sending review invitations to niche product buyers and emphasize the small number of existing reviews or even by highlighting small numbers of existing reviews in the design of the website (Shen et al., 2015).  As companies have their own specific goals, they may develop their own algorithms for selecting certain groups of reviewers to receive review invitations, rather than sending these invitations to every buyer, as is currently the common practice (Shen et al., 2015).

Lastly, reviewers that consistently offer highly differentiated reviews should carefully be taken into account by companies as these reviewers might simply be trying to game the system rather than serve the purpose of the review of signaling product quality (Shen et al., 2015). This can be through the use of ranks, but also other signals, such as “helpfulness” votes or even altered algorithms for such reviewers.

References

Basuroy, S., Chatterjee, S., & Ravid, S. A. (2003). How critical are critical reviews? The box office effects of film critics, star power, and budgets. Journal of marketing67(4), 103-117.

Coster, H. (2006). The Secret Life of An Online Book Reviewer. Forbes, December1.

Deloitte. (2007). “Most Consumers Read and Rely on Online Reviews; Companies Must Adjust,” Deloitte & Touche USA LLP.

Godes, D., & Silva, J. C. (2012). Sequential and temporal dynamics of online opinion. Marketing Science31(3), 448-473.

Shen, W., Hu, Y. J., & Ulmer, J. R. (2015). Competing for Attention: An Empirical Study of Online Reviewers’ Strategic Behavior. Mis Quarterly39(3), 683-696.

Group 10.