Getting Started with psych350data

Overview

The psych350data package provides all the datasets used in PSYC 350 labs at UNL. Every dataset is ready to use the moment you load the package — no file downloading, no manual data entry, and no SPSS license needed.

The package supports three workflows depending on what you need to do:

Use the raw R data directly — categorical variables are human-readable strings like "Film" or "Control", which work perfectly with ggplot2 and most R functions.
Prep for numeric analysis matching SPSS — prep_*() functions convert character categories to the same numeric codes used in the SPSS files (e.g., 1, 2, 3), so your R output matches SPSS output exactly. This is useful when working with functions from the psych350lab package that expect numeric grouping variables.
Export to SPSS — export_*_sav() functions create fully labeled .sav files with numeric codes, value labels, variable labels, and -99 for missing values. Use this to create answer key files or student data files for labs.

Installation

# Install from GitHub (one time only)
# install.packages("pak")
pak::pak("emmarshall/psych350data")

Loading the Package

library(psych350data)
library(dplyr)

Browsing Available Datasets

Use list_datasets() to see every dataset in the package:

list_datasets()
#>  [1] "superman"              "superman_smes"         "superman_movies"      
#>  [4] "superman_combined"     "hotones"               "hotones_sauces"       
#>  [7] "hotones_episodes"      "tip_jokes"             "mcu"                  
#> [10] "mock_jury"             "candy"                 "candy_simple"         
#> [13] "football"              "huskers"               "interpersonal_data"   
#> [16] "self_descriptive_data" "parent_child_data"     "hindsight_mg_data"    
#> [19] "hindsight_wg_data"     "cheese_data"           "lpd_data"

For detailed documentation on any dataset, use ? in the R console:

?superman
?hotones
?mock_jury
?football
?interpersonal_data

See the Dataset Reference for a complete guide to every dataset including sources and variable descriptions.

Workflow 1: Raw Data in R

Every dataset is a tibble that’s available immediately after loading the package. Categorical variables use descriptive character values that are easy to read and plot:

superman |>
  select(num, media, type, clark_grp, age_grp) |>
  head()
#> # A tibble: 6 × 5
#>     num media                   type    clark_grp     age_grp
#>   <int> <chr>                   <chr>   <chr>         <chr>  
#> 1     1 Superman                Film    6ft or taller Average
#> 2     2 Superman: The Movie     Film    6ft or taller Average
#> 3     3 Smallville              TV Show 6ft or taller Minimal
#> 4     4 Superman Returns        Film    6ft or taller Average
#> 5     5 Superman & the Mole Men Film    6ft or taller Big    
#> 6     6 Man of Steel            Film    6ft or taller Big

football |>
  count(group)
#> # A tibble: 3 × 2
#>   group                        n
#>   <chr>                    <int>
#> 1 Control                     25
#> 2 Football no concussion      25
#> 3 Football with concussion    25

Plotting with ggplot2

Because categorical variables are already human-readable strings, they work directly as axis labels and legend entries in ggplot2 — no extra formatting needed:

library(ggplot2)

# Bar chart — group labels appear automatically
ggplot(football, aes(x = group, y = volume)) +
  geom_boxplot() +
  labs(x = "Group", y = "Brain Volume")

# The labels "Control", "Football no concussion", etc.
# appear on the x-axis without any extra work.

If you need to control the order of categories on the axis (e.g., for a specific ordering in a bar chart), convert to a factor with explicit levels:

football |>
  mutate(
    group = factor(group, levels = c("Control",
                                     "Football no concussion",
                                     "Football with concussion"))
  ) |>
  ggplot(aes(x = group, y = volume)) +
  geom_boxplot()

Handling Missing Values

Missing values in the raw R data are standard NA, which R handles automatically in most functions:

# NA values are excluded with na.rm = TRUE
mean(superman$rt_critics_score, na.rm = TRUE)
#> [1] 79.375

Running Analyses with Raw Data

For analyses like ANOVA or t-tests that need a factor grouping variable, wrap the character column in factor():

# One-way ANOVA using the character group variable
model <- aov(volume ~ factor(group), data = football)
summary(model)
#>               Df Sum Sq Mean Sq F value   Pr(>F)    
#> factor(group)  2  44.35  22.174   31.47 1.51e-10 ***
#> Residuals     72  50.73   0.705                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Workflow 2: Prep for Numeric Analysis

If your analysis needs to produce output with numeric codes that match SPSS — for example, when comparing your R results to SPSS output in a lab, or when using functions from the psych350lab package — use prep_*() functions.

What `prep_*()` Does

Each prep_*() function replaces character categorical variables with the same numeric codes used in the SPSS .sav file. For example, in the superman dataset, type goes from "Film" / "TV Series" / "Serial" to 1 / 2 / 3, and age_grp goes from "Minimal" / "Average" / "Big" to 1 / 2 / 3.

superman_num <- prep_superman(superman)

superman_num |>
  select(num, media, type, clark_grp, age_grp) |>
  head()
#> # A tibble: 6 × 5
#>     num media                    type clark_grp age_grp
#>   <int> <chr>                   <dbl>     <dbl>   <dbl>
#> 1     1 Superman                    1         2       2
#> 2     2 Superman: The Movie         1         2       2
#> 3     3 Smallville                  2         2       1
#> 4     4 Superman Returns            1         2       2
#> 5     5 Superman & the Mole Men     1         2       3
#> 6     6 Man of Steel                1         2       3

Using the Generic `prep_data()` Function

You can also use the generic prep_data() function with any dataset name as a string:

superman_num <- prep_data(superman, "superman")

superman_num |>
  select(num, media, type, age_grp) |>
  head()
#> # A tibble: 6 × 4
#>     num media                    type age_grp
#>   <int> <chr>                   <dbl>   <dbl>
#> 1     1 Superman                    1       2
#> 2     2 Superman: The Movie         1       2
#> 3     3 Smallville                  2       1
#> 4     4 Superman Returns            1       2
#> 5     5 Superman & the Mole Men     1       3
#> 6     6 Man of Steel                1       3

All Prep Functions

Every dataset has its own dedicated prep function:

prep_superman(superman)
prep_superman_smes(superman_smes)
prep_superman_movies(superman_movies)
prep_hotones(hotones)
prep_hotones_sauces(hotones_sauces)
prep_hotones_episodes(hotones_episodes)
prep_mcu(mcu)
prep_mock_jury(mock_jury)
prep_tip_jokes(tip_jokes)
prep_candy(candy)
prep_candy_simple(candy_simple)
prep_football(football)
prep_huskers(huskers)
prep_interpersonal(interpersonal_data)
prep_self_descriptive(self_descriptive_data)
prep_parent_child(parent_child_data)
prep_hindsight_bg(hindsight_mg_data)
prep_hindsight_wg(hindsight_wg_data)
prep_cheese(cheese_data)
prep_lpd(lpd_data)

Note

Some datasets (like mock_jury, tip_jokes, interpersonal_data, and self_descriptive_data) already store their categorical variables as numeric codes. Their prep functions still exist for consistency, but they return the data unchanged.

Compatibility with psych350lab

The psych350lab package provides helper functions for PSYC 350 lab assignments. Many of these functions expect data that looks like an SPSS .sav file — that is, categorical/grouping variables stored as numbers (not character strings) and missing values coded as -99 rather than NA.

To make any psych350data dataset work with psych350lab functions, you need two steps:

Step 1: Convert categories to numeric codes with prep_*():

football_num <- prep_football(football)
football_num |> count(group)
#> # A tibble: 3 × 2
#>   group     n
#>   <dbl> <int>
#> 1     1    25
#> 2     2    25
#> 3     3    25
# group is now 1, 2, 3 instead of "Control", "Football no concussion", ...

Step 2: Replace NA with -99 (if your psych350lab function expects -99 for missing):

library(psych350lab)

# Replace NA with -99 across all numeric columns
football_spss <- football_num |>
  mutate(across(where(is.numeric), \(x) ifelse(is.na(x), -99, x)))

# Now the data looks exactly like it would in SPSS
# and is ready for psych350lab functions

Tip

If you’re only running standard R functions like aov(), t.test(), or cor.test(), you do not need the -99 step — R handles NA natively. The -99 replacement is only needed for functions that specifically expect SPSS-style missing value coding.

Keeping Labels for Plotting After Prep

By default, prep_*() replaces the character values with numbers, and the original labels are lost. If you need both the numeric codes (for analysis) and the original labels (for plotting), use keep_labels = TRUE:

superman_both <- prep_superman(superman, keep_labels = TRUE)

superman_both |>
  select(num, type, type_label) |>
  head()
#> # A tibble: 6 × 3
#>     num  type type_label
#>   <int> <dbl> <chr>     
#> 1     1     1 Film      
#> 2     2     1 Film      
#> 3     3     2 TV Show   
#> 4     4     1 Film      
#> 5     5     1 Film      
#> 6     6     1 Film

This is useful when you want numeric codes for analysis but readable labels in ggplot2:

# Use the _label column for readable axis labels
superman_both |>
  ggplot(aes(x = type_label, y = rt_critics_score)) +
  geom_boxplot() +
  labs(x = "Media Type", y = "Rotten Tomatoes Critics Score")

Workflow 3: Export to SPSS

Use export_*_sav() functions to create .sav files for SPSS or JASP. The export functions handle everything automatically — you do not need to run a prep function first.

Each export produces a file with:

Numeric codes for all categorical variables
Value labels so SPSS displays category names (e.g., 1 = “Control”)
Variable labels describing each column in SPSS Variable View
-99 for missing values, registered as user-defined missing so SPSS excludes them automatically

Quick Export

# Export to current working directory
export_superman_sav()

# Export to a specific location
export_football_sav("~/Desktop/football_data.sav")

Exporting for Answer Keys vs. Student Data

A common workflow is to export the full dataset as an instructor answer key, and a subset of variables as the student version.

Full dataset (answer key):

export_superman_sav("superman_answer_key.sav")

Subset of variables (student version):

# Use select() to choose only the variables students need,
# then pipe to export_sav()
superman |>
  select(num, media, type, clark_height, rt_critics_score) |>
  export_sav(path = "superman_student.sav")

# Works with tidyselect helpers too
interpersonal_data |>
  select(age, gender, race, gcb, risc, lsas) |>
  export_sav(path = "interpersonal_student.sav")

Export All Datasets at Once

export_all_sav(dir = "~/Desktop/PSYC350_SPSS/")

This creates one .sav file per dataset in the specified folder.

See the Exporting to SPSS vignette for the full export guide, including how to audit exports and control missing value behavior.

Choosing the Right Workflow

Situation	Recommended workflow
Exploring data, making plots with ggplot2	Use the raw data as-is
Running standard R analyses (t-test, ANOVA, correlation)	Use the raw data with `factor()` for grouping variables
R analysis that needs to match SPSS numeric output	`prep_*()`
Using psych350lab functions that expect numeric groups and -99	`prep_*()` then replace NA with -99
Plotting after prepping (need readable axis labels)	`prep_*(..., keep_labels = TRUE)`
Creating a `.sav` file for SPSS or JASP	`export_*_sav()`
Creating a student data file with fewer variables	`select()` then `export_sav()`
Creating instructor answer key `.sav` files	`export_*_sav()` with full dataset

R vs. SPSS: What’s Different?

Aspect	Raw R data	After `prep_*()`	SPSS `.sav` export
Categorical values	Character strings (`"Film"`, `"Control"`)	Numeric codes (1, 2, 3)	Numeric codes with value labels
Missing values	`NA`	`NA`	`-99` with user-defined missing
Variable descriptions	`?dataset` help files	`?dataset` help files	Variable labels in Variable View
Best for	Plotting, exploration	Matching SPSS output in R	SPSS / JASP labs

Overview

Installation

Loading the Package

Browsing Available Datasets

Workflow 1: Raw Data in R

Plotting with ggplot2

Handling Missing Values

Running Analyses with Raw Data

Workflow 2: Prep for Numeric Analysis

What prep_*() Does

Using the Generic prep_data() Function

All Prep Functions

Compatibility with psych350lab

Keeping Labels for Plotting After Prep

Workflow 3: Export to SPSS

Quick Export

Exporting for Answer Keys vs. Student Data

Export All Datasets at Once

Choosing the Right Workflow

R vs. SPSS: What’s Different?

What `prep_*()` Does

Using the Generic `prep_data()` Function