survivoR

622 episodes. 42 seasons. 1 package!

survivoR is a collection of data sets detailing events across all 41 seasons of the US Survivor, including castaway information, vote history, immunity and reward challenge winners and jury votes.

Installation

Now on CRAN (v0.9.12).

install.packages("survivoR")

Or install from Git for the latest (v1.0). I’m constantly improving the data sets and the github version is likely to be slightly improved.

devtools::install_github("doehm/survivoR")

News

survivoR v1.0

Australian Survivor: Blood Vs Water

For episode by episode updates follow me on .

    Confessional counts

Survivor: 42

Dev version v1.0 includes episodes 1 to 11.

    Infographic

    Confessional counts

Confessional counts from myself, Carly Levitz and juststrategic

Dataset overview

Season summary

A table containing summary details of each season of Survivor, including the winner, runner ups and location.

season_summary
#> # A tibble: 42 x 22
#>    version version_season season_name   season location   country tribe_setup   
#>    <chr>   <chr>          <chr>          <dbl> <chr>      <chr>   <chr>         
#>  1 US      US01           Survivor: Bo~      1 Pulau Tig~ Malays~ Two tribes of~
#>  2 US      US02           Survivor: Th~      2 Herbert R~ Austra~ Two tribes of~
#>  3 US      US03           Survivor: Af~      3 Shaba Nat~ Kenya   Two tribes of~
#>  4 US      US04           Survivor: Ma~      4 Nuku Hiva~ Polyne~ Two tribes of~
#>  5 US      US05           Survivor: Th~      5 Ko Taruta~ Thaila~ Two tribes of~
#>  6 US      US06           Survivor: Th~      6 Rio Negro~ Brazil  Two tribes of~
#>  7 US      US07           Survivor: Pe~      7 Pearl Isl~ Panama  Two tribes of~
#>  8 US      US08           Survivor: Al~      8 Pearl Isl~ Panama  Three tribes ~
#>  9 US      US09           Survivor: Va~      9 Efate, Sh~ Vanuatu Two tribes of~
#> 10 US      US10           Survivor: Pa~     10 Koror, Pa~ Palau   A schoolyard ~
#> # ... with 32 more rows, and 15 more variables: full_name <chr>,
#> #   winner_id <chr>, winner <chr>, runner_ups <chr>, final_vote <chr>,
#> #   timeslot <chr>, premiered <date>, ended <date>, filming_started <date>,
#> #   filming_ended <date>, viewers_premier <dbl>, viewers_finale <dbl>,
#> #   viewers_reunion <dbl>, viewers_mean <dbl>, rank <dbl>

Castaways

This data set contains season and demographic information about each castaway. It is structured to view their results for each season. Castaways that have played in multiple seasons will feature more than once with the age and location representing that point in time. Castaways that re-entered the game will feature more than once in the same season as they technically have more than one boot order e.g. Natalie Anderson - Winners at War.

Each castaway has a unique castaway_id which links the individual across all data sets and seasons. It also links to the following ID’s found on the vote_history, jury_votes and challenges data sets.

castaways |> 
  filter(season == 41)
#> # A tibble: 18 x 17
#>    version version_season season_name  season full_name     castaway_id castaway
#>    <chr>   <chr>          <chr>         <dbl> <chr>         <chr>       <chr>   
#>  1 US      US41           Survivor: 41     41 Erika Casupa~ US0594      Erika   
#>  2 US      US41           Survivor: 41     41 Deshawn Radd~ US0601      Deshawn 
#>  3 US      US41           Survivor: 41     41 Xander Hasti~ US0597      Xander  
#>  4 US      US41           Survivor: 41     41 Heather Aldr~ US0593      Heather 
#>  5 US      US41           Survivor: 41     41 Ricard Foye   US0596      Ricard  
#>  6 US      US41           Survivor: 41     41 Danny McCray  US0599      Danny   
#>  7 US      US41           Survivor: 41     41 Liana Wallace US0608      Liana   
#>  8 US      US41           Survivor: 41     41 Shantel Smith US0606      Shan    
#>  9 US      US41           Survivor: 41     41 Evvie Jagoda  US0598      Evvie   
#> 10 US      US41           Survivor: 41     41 Naseer Mutta~ US0600      Naseer  
#> 11 US      US41           Survivor: 41     41 Tiffany Seely US0604      Tiffany 
#> 12 US      US41           Survivor: 41     41 Sydney Segal  US0605      Sydney  
#> 13 US      US41           Survivor: 41     41 Genie Chen    US0595      Genie   
#> 14 US      US41           Survivor: 41     41 Jairus Robin~ US0603      JD      
#> 15 US      US41           Survivor: 41     41 Brad Reese    US0602      Brad    
#> 16 US      US41           Survivor: 41     41 David Voce    US0607      Voce    
#> 17 US      US41           Survivor: 41     41 Sara Wilson   US0592      Sara    
#> 18 US      US41           Survivor: 41     41 Eric Abraham  US0591      Abraham 
#> # ... with 10 more variables: age <dbl>, city <chr>, state <chr>,
#> #   personality_type <chr>, episode <dbl>, day <dbl>, order <dbl>,
#> #   result <chr>, jury_status <chr>, original_tribe <chr>

Castaway details

A few castaways have changed their name from season to season or have been referred to by a different name during the season e.g. Amber Mariano; in season 8 Survivor All-Stars there was Rob C and Rob M. That information has been retained here in the castaways data set.

castaway_details contains unique information for each castaway. It takes the full name from their most current season and their most verbose short name which is handy for labelling.

It also includes gender, date of birth, occupation, race and ethnicity data. If no source was found to determine a castaways race and ethnicity, the data is kept as missing rather than making an assumption.

castaway_details
#> # A tibble: 626 x 11
#>    castaway_id full_name     short_name date_of_birth date_of_death gender race 
#>    <chr>       <chr>         <chr>      <date>        <date>        <chr>  <chr>
#>  1 US0001      Sonja Christ~ Sonja      1937-01-28    NA            Female <NA> 
#>  2 US0002      B.B. Andersen B.B.       1936-01-18    2013-10-29    Male   <NA> 
#>  3 US0003      Stacey Still~ Stacey     1972-08-11    NA            Female <NA> 
#>  4 US0004      Ramona Gray   Ramona     1971-01-20    NA            Female Black
#>  5 US0005      Dirk Been     Dirk       1976-06-15    NA            Male   <NA> 
#>  6 US0006      Joel Klug     Joel       1972-04-13    NA            Male   <NA> 
#>  7 US0007      Gretchen Cor~ Gretchen   1962-02-07    NA            Female <NA> 
#>  8 US0008      Greg Buis     Greg       1975-12-31    NA            Male   <NA> 
#>  9 US0009      Jenna Lewis   Jenna L.   1977-07-16    NA            Female <NA> 
#> 10 US0010      Gervase Pete~ Gervase    1969-11-02    NA            Male   Black
#> # ... with 616 more rows, and 4 more variables: ethnicity <chr>, poc <chr>,
#> #   occupation <chr>, personality_type <chr>

Vote history

This data frame contains a complete history of votes cast across all seasons of Survivor. This allows you to see who who voted for who at which Tribal Council. It also includes details on who had individual immunity as well as who had their votes nullified by a hidden immunity idol. This details the key events for the season.

vh <- vote_history |> 
  filter(
    season == 41,
    episode == 9
  ) 
vh
#> # A tibble: 17 x 21
#>    version version_season season_name  season episode   day tribe_status tribe  
#>    <chr>   <chr>          <chr>         <dbl>   <dbl> <dbl> <chr>        <chr>  
#>  1 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  2 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  3 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  4 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  5 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  6 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  7 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  8 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#>  9 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 10 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 11 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 12 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 13 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 14 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 15 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 16 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> 17 US      US41           Survivor: 41     41       9    17 Merged       Via Ka~
#> # ... with 13 more variables: castaway <chr>, immunity <chr>, vote <chr>,
#> #   vote_event <chr>, split_vote <chr>, nullified <lgl>, tie <lgl>,
#> #   voted_out <chr>, order <dbl>, vote_order <dbl>, castaway_id <chr>,
#> #   vote_id <chr>, voted_out_id <chr>
vh |> 
  count(vote)
#> # A tibble: 5 x 2
#>   vote        n
#>   <chr>   <int>
#> 1 Evvie       3
#> 2 Heather     2
#> 3 Liana       2
#> 4 Naseer      8
#> 5 <NA>        2

Events in the game such as fire challenges, rock draws, steal-a-vote advantages or countbacks in the early days often mean a vote wasn’t placed for an individual. Rather a challenge may be won, lost, no vote cast but attended Tribal Council, etc. These events are recorded in the vote field. I have included a function clean_votes for when only need the votes cast for individuals. If the input data frame has the vote column it can simply be piped.

vh |> 
  clean_votes() |> 
  count(vote)
#> # A tibble: 4 x 2
#>   vote        n
#>   <chr>   <int>
#> 1 Evvie       3
#> 2 Heather     2
#> 3 Liana       2
#> 4 Naseer      8

Challenges

The challenge_results and challenge_description data sets supersede the challenges data set.

Challenge results

A nested tidy data frame of immunity and reward challenge results. The winners and winning tribe of the challenge are found by expanding the winners column. For individual immunity challenges the winning tribe is simply NA.

challenge_results |> 
  filter(season == 41)
#> # A tibble: 21 x 13
#>    version version_season season_name  season episode   day order episode_title 
#>    <chr>   <chr>          <chr>         <dbl>   <dbl> <dbl> <dbl> <chr>         
#>  1 US      US41           Survivor: 41     41       1     3     0 A New Era     
#>  2 US      US41           Survivor: 41     41       1     3     0 A New Era     
#>  3 US      US41           Survivor: 41     41       2     5     2 Juggling Chai~
#>  4 US      US41           Survivor: 41     41       3     7     3 My Million Do~
#>  5 US      US41           Survivor: 41     41       4     9     4 They Hate Me ~
#>  6 US      US41           Survivor: 41     41       4     9     4 They Hate Me ~
#>  7 US      US41           Survivor: 41     41       5    11     5 The Strategis~
#>  8 US      US41           Survivor: 41     41       6    13     5 Ready to Play~
#>  9 US      US41           Survivor: 41     41       7    14     6 There's Gonna~
#> 10 US      US41           Survivor: 41     41       7    14     6 There's Gonna~
#> # ... with 11 more rows, and 5 more variables: challenge_name <chr>,
#> #   challenge_type <chr>, outcome_type <chr>, challenge_id <chr>,
#> #   winners <list>

Typically in the merge if a single person win a reward they are allowed to bring others along with them. This is identified by outcome_status column. If it states Chosen to particpate it means they were chosen by the winner to particpate in the reward.

The day field on this data set represents the day of the tribal council rather than the day of the challenge. This is to more easily associate the reward challenge with the immunity challenge and result of the tribal council. It also helps for joining tables.

The challenge_id is the primary key for the challenge_description data set. The challange_id will change as the data or descriptions change.

Challenge description

This data set contains descriptive binary fields for each challenge. Challenges can go by different names but where possible recurring challenges are kept consistent. While there are tweaks to the challenges, where the main components of the challenge is consistent, they share the same name.

The features of each challenge have been determined largely through string searches of key words that describe the challenge. It may not capture the full essence of the challenge but on the whole will provide a good basis for analysis. Since the description is simply a short paragraph or sentence it may not flag all appropriate features. If any descriptive features need altering please let me know in the issues.

Features:

challenge_description
#> # A tibble: 886 x 14
#>    challenge_id challenge_name     puzzle race  precision endurance strength
#>    <chr>        <chr>              <lgl>  <lgl> <lgl>     <lgl>     <lgl>   
#>  1 CC0053       Barrel of Monkeys  FALSE  TRUE  TRUE      FALSE     FALSE   
#>  2 CC0079       Blue Lagoon Bustle TRUE   TRUE  TRUE      FALSE     FALSE   
#>  3 CC0114       By the Numbers     FALSE  TRUE  FALSE     FALSE     FALSE   
#>  4 CC0138       Choose Your Weapon FALSE  TRUE  TRUE      FALSE     FALSE   
#>  5 CC0232       Flashback          FALSE  FALSE FALSE     TRUE      FALSE   
#>  6 CC0305       Home Stretch       TRUE   TRUE  TRUE      FALSE     FALSE   
#>  7 CC0334       Kenny Log-Ins      TRUE   TRUE  TRUE      FALSE     FALSE   
#>  8 CC0358       Log Jam            FALSE  TRUE  FALSE     TRUE      FALSE   
#>  9 CC0371       Marooning          FALSE  TRUE  FALSE     FALSE     FALSE   
#> 10 CC0408       O-Black Water      FALSE  TRUE  TRUE      FALSE     FALSE   
#> # ... with 876 more rows, and 7 more variables: turn_based <lgl>,
#> #   balance <lgl>, food <lgl>, knowledge <lgl>, memory <lgl>, fire <lgl>,
#> #   water <lgl>

challenge_description |> 
  summarise_if(is_logical, sum)
#> # A tibble: 1 x 12
#>   puzzle  race precision endurance strength turn_based balance  food knowledge
#>    <int> <int>     <int>     <int>    <int>      <int>   <int> <int>     <int>
#> 1    238   721       184       115       50        132     143    23        55
#> # ... with 3 more variables: memory <int>, fire <int>, water <int>

Jury votes

History of jury votes. It is more verbose than it needs to be, however having a 0-1 column indicating if a vote was placed or not makes it easier to summarise castaways that received no votes.

jury_votes |> 
  filter(season == 41)
#> # A tibble: 24 x 9
#>    version version_season season_name season castaway finalist  vote castaway_id
#>    <chr>   <chr>          <chr>        <dbl> <chr>    <chr>    <dbl> <chr>      
#>  1 US      US41           Survivor: ~     41 Heather  Deshawn      0 US0593     
#>  2 US      US41           Survivor: ~     41 Ricard   Deshawn      0 US0596     
#>  3 US      US41           Survivor: ~     41 Danny    Deshawn      1 US0599     
#>  4 US      US41           Survivor: ~     41 Liana    Deshawn      0 US0608     
#>  5 US      US41           Survivor: ~     41 Shan     Deshawn      0 US0606     
#>  6 US      US41           Survivor: ~     41 Evvie    Deshawn      0 US0598     
#>  7 US      US41           Survivor: ~     41 Naseer   Deshawn      0 US0600     
#>  8 US      US41           Survivor: ~     41 Tiffany  Deshawn      0 US0604     
#>  9 US      US41           Survivor: ~     41 Heather  Erika        1 US0593     
#> 10 US      US41           Survivor: ~     41 Ricard   Erika        1 US0596     
#> # ... with 14 more rows, and 1 more variable: finalist_id <chr>
jury_votes |> 
  filter(season == 41) |> 
  group_by(finalist) |> 
  summarise(votes = sum(vote))
#> # A tibble: 3 x 2
#>   finalist votes
#>   <chr>    <dbl>
#> 1 Deshawn      1
#> 2 Erika        7
#> 3 Xander       0

Advantages

Advantage Details

This dataset lists the hidden idols and advantages in the game for all seasons. It details where it was found, if there was a clue to the advantage, location and other advantage conditions. This maps to the advantage_movement table.

advantage_details |> 
  filter(season == 41)
#> # A tibble: 9 x 9
#>   version version_season season_name  season advantage_id advantage_type      
#>   <chr>   <chr>          <chr>         <dbl> <chr>        <chr>               
#> 1 US      US41           Survivor: 41     41 USEV4101     Extra vote          
#> 2 US      US41           Survivor: 41     41 USEV4102     Extra vote          
#> 3 US      US41           Survivor: 41     41 USEV4103     Extra vote          
#> 4 US      US41           Survivor: 41     41 USHI4101     Hidden immunity idol
#> 5 US      US41           Survivor: 41     41 USHI4102     Hidden immunity idol
#> 6 US      US41           Survivor: 41     41 USHI4103     Hidden immunity idol
#> 7 US      US41           Survivor: 41     41 USHI4104     Hidden immunity idol
#> 8 US      US41           Survivor: 41     41 USKP4101     Knowledge is power  
#> 9 US      US41           Survivor: 41     41 USVS4101     Steal a vote        
#> # ... with 3 more variables: clue_details <chr>, location_found <chr>,
#> #   conditions <chr>

Advantage Movement

The advantage_movement table tracks who found the advantage, who they may have handed it to and who the played it for. Each step is called an event. The sequence_id tracks the logical step of the advantage. For example in season 41, JD found an Extra Vote advantage. JD gave it to Shan in good faith who then voted him out keeping the Extra Vote. Shan gave it to Ricard in good faith who eventually gave it back before Shan played it for Naseer. That movement is recorded in this table.

advantage_movement |> 
  filter(advantage_id == "USEV4102")
#> # A tibble: 5 x 15
#>   version version_season season_name  season castaway castaway_id advantage_id
#>   <chr>   <chr>          <chr>         <dbl> <chr>    <chr>       <chr>       
#> 1 US      US41           Survivor: 41     41 JD       US0603      USEV4102    
#> 2 US      US41           Survivor: 41     41 Shan     US0606      USEV4102    
#> 3 US      US41           Survivor: 41     41 Ricard   US0596      USEV4102    
#> 4 US      US41           Survivor: 41     41 Shan     US0606      USEV4102    
#> 5 US      US41           Survivor: 41     41 Shan     US0606      USEV4102    
#> # ... with 8 more variables: sequence_id <dbl>, day <dbl>, episode <dbl>,
#> #   event <chr>, played_for <chr>, played_for_id <chr>, success <chr>,
#> #   votes_nullified <dbl>

Confessionals

A dataset containing the number of confessionals for each castaway by season and episode. The data has been counted by contributors of the survivoR R package and consolidated with external sources. The aim is to establish consistency in confessional counts in the absence of official sources. Given the subjective nature of the counts and the potential for clerical error no single source is more valid than another. Therefore, it is reasonable to average across all sources.

confessionals |> 
  filter(season == 41) |> 
  group_by(castaway) |> 
  summarise(n_confessionals = sum(confessional_count))
#> # A tibble: 18 x 2
#>    castaway n_confessionals
#>    <chr>              <dbl>
#>  1 Abraham                2
#>  2 Brad                  14
#>  3 Danny                 29
#>  4 Deshawn               56
#>  5 Erika                 39
#>  6 Evvie                 35
#>  7 Genie                 11
#>  8 Heather               13
#>  9 JD                    22
#> 10 Liana                 34
#> 11 Naseer                11
#> 12 Ricard                36
#> 13 Sara                   4
#> 14 Shan                  51
#> 15 Sydney                15
#> 16 Tiffany               25
#> 17 Voce                   6
#> 18 Xander                56

Tribe mapping

A mapping for castaways to tribes for each day (day being the day of the tribal council). This is useful for observing who is on what tribe throughout the game. Each season by day holds a complete list of castaways still in the game and which tribe they are on. Moving through each day you can observe the changes in the tribe. For example the first day (usual day 2) has all castaways mapped to their original tribe. The next day has the same minus the castaway just voted out. This is useful for observing the changes in tribe make either due to castaways being voted off the island, tribe swaps, who is on Redemption Island and Edge of Extinction.

tribe_mapping |> 
  filter(season == 41)
#> # A tibble: 154 x 10
#>    version version_season season_name  season episode   day castaway_id castaway
#>    <chr>   <chr>          <chr>         <dbl>   <dbl> <dbl> <chr>       <chr>   
#>  1 US      US41           Survivor: 41     41       1     3 US0591      Abraham 
#>  2 US      US41           Survivor: 41     41       1     3 US0592      Sara    
#>  3 US      US41           Survivor: 41     41       1     3 US0593      Heather 
#>  4 US      US41           Survivor: 41     41       1     3 US0594      Erika   
#>  5 US      US41           Survivor: 41     41       1     3 US0595      Genie   
#>  6 US      US41           Survivor: 41     41       1     3 US0596      Ricard  
#>  7 US      US41           Survivor: 41     41       1     3 US0597      Xander  
#>  8 US      US41           Survivor: 41     41       1     3 US0598      Evvie   
#>  9 US      US41           Survivor: 41     41       1     3 US0599      Danny   
#> 10 US      US41           Survivor: 41     41       1     3 US0600      Naseer  
#> # ... with 144 more rows, and 2 more variables: tribe <chr>, tribe_status <chr>

Boot Mapping

A mapping table for easily filtering to the set of castaways that are still in the game after a specified number of boots. How this differs from the tribe mapping is that rather than being focused on an episode, it is focused on the boot which is often more useful. This table an be used to calculate how many people participated in certain challanges once mapped to challenge_results.

When someone quits the game or is medically evacuated it is considered a boot. This table tracks multiple boots per episode.

# filter to season 41 and when there are 6 people left
# 18 people in the season, therefore 12 boots

boot_mapping |> 
  filter(
    season == 41,
    order == 12
    )
#> # A tibble: 6 x 11
#>   version version_season season_name  season episode order castaway castaway_id
#>   <chr>   <chr>          <chr>         <dbl>   <dbl> <dbl> <chr>    <chr>      
#> 1 US      US41           Survivor: 41     41      12    12 Heather  US0593     
#> 2 US      US41           Survivor: 41     41      12    12 Erika    US0594     
#> 3 US      US41           Survivor: 41     41      12    12 Ricard   US0596     
#> 4 US      US41           Survivor: 41     41      12    12 Xander   US0597     
#> 5 US      US41           Survivor: 41     41      12    12 Danny    US0599     
#> 6 US      US41           Survivor: 41     41      12    12 Deshawn  US0601     
#> # ... with 3 more variables: tribe <chr>, tribe_status <chr>, in_the_game <lgl>

Viewers

A data frame containing the viewer information for every episode across all seasons. It also includes the rating and viewer share information for viewers aged 18 to 49 years of age.

viewers |> 
  filter(season == 41)
#> # A tibble: 14 x 12
#>    version version_season season_name  season episode_number_overall episode
#>    <chr>   <chr>          <chr>         <dbl>                  <dbl>   <dbl>
#>  1 US      US41           Survivor: 41     41                    597       1
#>  2 US      US41           Survivor: 41     41                    598       2
#>  3 US      US41           Survivor: 41     41                    599       3
#>  4 US      US41           Survivor: 41     41                    600       4
#>  5 US      US41           Survivor: 41     41                    601       5
#>  6 US      US41           Survivor: 41     41                    602       6
#>  7 US      US41           Survivor: 41     41                    603       7
#>  8 US      US41           Survivor: 41     41                    604       8
#>  9 US      US41           Survivor: 41     41                    605       9
#> 10 US      US41           Survivor: 41     41                    606      10
#> 11 US      US41           Survivor: 41     41                    607      11
#> 12 US      US41           Survivor: 41     41                    608      12
#> 13 US      US41           Survivor: 41     41                    609      13
#> 14 US      US41           Survivor: 41     41                    610      14
#> # ... with 6 more variables: episode_title <chr>, episode_date <date>,
#> #   viewers <dbl>, rating_18_49 <dbl>, share_18_49 <dbl>, imdb_rating <dbl>

Tribe colours

This data frame contains the tribe names and colours for each season, including the RGB values. These colours can be joined with the other data frames to customise colours for plots. Another option is to add tribal colours to ggplots with the scale functions.

tribe_colours
#> # A tibble: 150 x 7
#>    version version_season season_name     season tribe tribe_colour tribe_status
#>    <chr>   <chr>          <chr>            <dbl> <chr> <chr>        <chr>       
#>  1 US      US01           Survivor: Born~      1 Pago~ #FFFF05      Original    
#>  2 US      US01           Survivor: Born~      1 Ratt~ #7CFC00      Merged      
#>  3 US      US01           Survivor: Born~      1 Tagi  #FF9900      Original    
#>  4 US      US02           Survivor: The ~      2 Barr~ #FF6600      Merged      
#>  5 US      US02           Survivor: The ~      2 Kucha #32CCFF      Original    
#>  6 US      US02           Survivor: The ~      2 Ogak~ #A7FC00      Original    
#>  7 US      US03           Survivor: Afri~      3 Boran #FFD700      Original    
#>  8 US      US03           Survivor: Afri~      3 Moto~ #00A693      Merged      
#>  9 US      US03           Survivor: Afri~      3 Samb~ #E41A2A      Original    
#> 10 US      US04           Survivor: Marq~      4 Mara~ #DFFF00      Original    
#> # ... with 140 more rows

Scale functions

Included are ggplot2 scale functions of the form scale_fill_survivor() and scale_fill_tribes() to add season and tribe colours to ggplot. The scale_fill_survivor() scales uses a colour palette extracted from the season logo and scale_fill_tribes() scales uses the tribal colours of the specified season as a colour palette.

All that is required for the ‘survivor’ palettes is the desired season as input. If not season is provided it will default to season 40.

castaways |> 
  count(season, personality_type) |> 
  ggplot(aes(x = season, y = n, fill = personality_type)) +
  geom_bar(stat = "identity") +
  scale_fill_survivor(40) +
  theme_minimal()

Below are the palettes for all seasons.

To use the tribe scales, simply input the season number desired to use those tribe colours. If the fill or colour aesthetic is the tribe name, this needs to be passed to the scale function as scale_fill_tribes(season, tribe = tribe) (for now) where tribe is on the input data frame. If the fill or colour aesthetic is independent from the actual tribe names, like gender for example, tribe does not need to be specified and will simply use the tribe colours as a colour palette, such as the viewers line graph above.

ssn <- 35
labels <- castaways |>
  filter(
    season == ssn,
    str_detect(result, "Sole|unner")
  ) |>
  mutate(label = glue("{castaway} ({original_tribe})")) |>
  select(label, castaway)

jury_votes |>
  filter(season == ssn) |>
  left_join(
    castaways |>
      filter(season == ssn) |>
      select(castaway, original_tribe),
    by = "castaway"
  ) |>
  group_by(finalist, original_tribe) |>
  summarise(votes = sum(vote)) |>
  left_join(labels, by = c("finalist" = "castaway")) |>
  {
    ggplot(., aes(x = label, y = votes, fill = original_tribe)) +
      geom_bar(stat = "identity", width = 0.5) +
      scale_fill_tribes(ssn, tribe = .$original_tribe) +
      theme_minimal() +
      labs(
        x = "Finalist (original tribe)",
        y = "Votes",
        fill = "Original\ntribe",
        title = "Votes received by each finalist"
      )
  }

Issues

Given the variable nature of the game of Survivor and changing of the rules, there are bound to be edges cases where the data is not quite right. Before logging an issue please install the git version to see if it has already been corrected. If not, please log an issue and I will correct the datasets.

New features will be added, such as details on exiled castaways across the seasons. If you have a request for specific data let me know in the issues and I’ll see what I can do. Also, if you’d like to contribute by adding to existing datasets or contribute a new dataset, please contact me directly.

Showcase

Survivor Dashboard

Carly Levitz has developed a fantastic dashboard showcasing the data and allowing you to drill down into seasons, castaways, voting history and challenges.

Data viz

This looks at the number of immunity idols won and votes received for each winner.

Contributors

A big thank you to:

References

Data was almost entirely sourced from Wikipedia. Other data, such as the tribe colours, was manually recorded and entered by myself and contributors.

Torch graphic in hex: Fire Torch Vectors by Vecteezy