Package 'bayesrules'

Title: Datasets and Supplemental Functions from Bayes Rules! Book
Description: Provides datasets and functions used for analysis and visualizations in the Bayes Rules! book (<https://www.bayesrulesbook.com>). The package contains a set of functions that summarize and plot Bayesian models from some conjugate families and another set of functions for evaluation of some Bayesian models.
Authors: Mine Dogucu [aut, cre] , Alicia Johnson [aut], Miles Ott [aut]
Maintainer: Mine Dogucu <[email protected]>
License: GPL (>= 3)
Version: 0.0.3.9000
Built: 2024-11-03 05:38:08 UTC
Source: https://github.com/bayes-rules/bayesrules

Help Index


Chicago AirBnB Data

Description

The AirBnB data was collated by Trinh and Ameri as part of a course project at St Olaf College, and distributed with "Broadening Your Statistical Horizons" by Legler and Roback. This data set includes the prices and features for 1561 AirBnB listings in Chicago, collected in 2016.

Usage

airbnb

Format

A data frame with 1561 rows and 12 variables. Each row represents a single AirBnB listing.

price

the nightly price of the listing (in USD)

rating

the listing's average rating, on a scale from 1 to 5

reviews

number of user reviews the listing has

room_type

the type of listing (eg: Shared room)

accommodates

number of guests the listing accommodates

bedrooms

the number of bedrooms the listing has

minimum_stay

the minimum number of nights to stay in the listing

neighborhood

the neighborhood in which the listing is located

district

the broader district in which the listing is located

walk_score

the neighborhood's rating for walkability (0 - 100)

transit_score

the neighborhood's rating for access to public transit (0 - 100)

bike_score

the neighborhood's rating for bikeability (0 - 100)

Source

Ly Trinh and Pony Ameri (2018). Airbnb Price Determinants: A Multilevel Modeling Approach. Project for Statistics 316-Advanced Statistical Modeling, St. Olaf College. Julie Legler and Paul Roback (2019). Broadening Your Statistical Horizons: Generalized Linear Models and Multilevel Models. https://bookdown.org/roback/bookdown-bysh/. https://github.com/proback/BeyondMLR/blob/master/data/airbnb.csv/


Chicago AirBnB Data

Description

The AirBnB data was collated by Trinh and Ameri as part of a course project at St Olaf College, and distributed with "Broadening Your Statistical Horizons" by Legler and Roback. This data set, a subset of the airbnb data in the bayesrules package, includes the prices and features for 869 AirBnB listings in Chicago, collected in 2016.

Usage

airbnb_small

Format

A data frame with 869 rows and 12 variables. Each row represents a single AirBnB listing.

price

the nightly price of the listing (in USD)

rating

the listing's average rating, on a scale from 1 to 5

reviews

number of user reviews the listing has

room_type

the type of listing (eg: Shared room)

accommodates

number of guests the listing accommodates

bedrooms

the number of bedrooms the listing has

minimum_stay

the minimum number of nights to stay in the listing

neighborhood

the neighborhood in which the listing is located

district

the broader district in which the listing is located

walk_score

the neighborhood's rating for walkability (0 - 100)

transit_score

the neighborhood's rating for access to public transit (0 - 100)

bike_score

the neighborhood's rating for bikeability (0 - 100)

Source

Ly Trinh and Pony Ameri (2018). Airbnb Price Determinants: A Multilevel Modeling Approach. Project for Statistics 316-Advanced Statistical Modeling, St. Olaf College. Julie Legler and Paul Roback (2019). Broadening Your Statistical Horizons: Generalized Linear Models and Multilevel Models. https://bookdown.org/roback/bookdown-bysh/. https://github.com/proback/BeyondMLR/blob/master/data/airbnb.csv/


Bald Eagle Count Data

Description

Bald Eagle count data collected from the year 1981 to 2017, in late December, by birdwatchers in the Ontario, Canada area. The data was made available by the Bird Studies Canada website and distributed through the R for Data Science TidyTuesday project. A more complete data set with a larger selection of birds can be found in the bird_counts data in the bayesrules package.

Usage

bald_eagles

Format

A data frame with 37 rows and 5 variables. Each row represents Bald Eagle observations in the given year.

year

year of data collection

count

number of birds observed

hours

total person-hours of observation period

count_per_hour

count divided by hours

count_per_week

count_per_hour multiplied by 168 hours per week

Source

https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-18/bird_counts.csv.


WNBA Basketball Data

Description

The WNBA Basketball Data was scraped from https://www.basketball-reference.com/wnba/players/ and contains information on basketball players from the 2019 season.

Usage

basketball

Format

A data frame with 146 rows and 30 variables. Each row represents a single WNBA basketball player. The variables on each player are as follows.

player_name

first and last name

height

height in inches

weight

weight in pounds

year

year of the WNBA season

team

team that the WNBA player is a member of

age

age in years

games_played

number of games played by the player in that season

games_started

number of games the player started in that season

avg_minutes_played

average number of minutes played per game

avg_field_goals

average number of field goals per game played

avg_field_goal_attempts

average number of field goals attempted per game played

field_goal_pct

percent of field goals made throughout the season

avg_three_pointers

average number of three pointers per game played

avg_three_pointer_attempts

average number of three pointers attempted per game played

three_pointer_pct

percent of three pointers made throughout the season

avg_two_pointers

average number of two pointers made per game played

avg_two_pointer_attempts

average number of two pointers attempted per game played

two_pointer_pct

percent of two pointers made throughout the season

avg_free_throws

average number of free throws made per game played

avg_free_throw_attempts

average number of free throws attempted per game played

free_throw_pct

percent of free throws made throughout the season

avg_offensive_rb

average number of offensive rebounds per game played

avg_defensive_rb

average number of defensive rebounds per game played

avg_rb

average number of rebounds (both offensive and defensive) per game played

avg_assists

average number of assists per game played

avg_steals

average number of steals per game played

avg_blocks

average number of blocks per game played

avg_turnovers

average number of turnovers per game played

avg_personal_fouls

average number of personal fouls per game played. Note: after 5 fouls the player is not allowed to play in that game anymore

avg_points

average number of points made per game played

total_minutes

total number of minutes played throughout the season

starter

whether or not the player started in more than half of the games they played

Source

https://www.basketball-reference.com/


Bechdel Test for over 1500 movies

Description

A dataset containing data behind the story "The Dollar-And-Cents Case Against Hollywood's Exclusion of Women" https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/.

Usage

bechdel

Format

A data frame with 1794 rows and 3 variables:

year

The release year of the movie

title

The title of the movie

binary

Bechdel test result (PASS, FAIL)

Source

<https://github.com/fivethirtyeight/data/tree/master/bechdel/>


Big Word Club (BWC)

Description

Data on the effectiveness of a digital learning program designed by the Abdul Latif Jameel Poverty Action Lab (J-PAL) to address disparities in vocabulary levels among children from households with different income levels.

Usage

big_word_club

Format

A data frame with 818 student-level observations and 31 variables:

participant_id

unique student id

treat

control group (0) or treatment group (1)

age_months

age in months

female

whether student identifies as female

kindergarten

grade level, pre-school (0) or kindergarten (1)

teacher_id

unique teacher id

school_id

unique school id

private_school

whether school is private

title1

whether school has Title 1 status

free_reduced_lunch

percent of school that receive free / reduced lunch

state

school location

esl_observed

whether student has ESL status

special_ed_observed

whether student has special education status

new_student

whether student enrolled after program began

distracted_a1

student's distraction level during assessment 1 (0 = not distracted; 1 = mildly distracted; 2 = moderately distracted; 3 = extremely distracted)

distracted_a2

same as distracted_a1 but during assessment 2

distracted_ppvt

same as distracted_a1 but during standardized assessment

score_a1

student score on BWC assessment 1

invalid_a1

whether student's score on assessment 1 was invalid

score_a2

student score on BWC assessment 2

invalid_a2

whether student's score on assessment 2 was invalid

score_ppvt

student score on standardized assessment

score_ppvt_age

score_ppvt adjusted for age

invalid_ppvt

whether student's score on standardized assessment was invalid

t_logins_april

number of teacher logins onto BWC system in April

t_logins_total

number of teacher logins onto BWC system during entire study

t_weeks_used

number of weeks of the BWC program that the classroom has completed

t_words_learned

teacher response to the number of words students had learned through BWC (0 = almost none; 1 = 1 to 5; 2 = 6 to 10)

t_financial_struggle

teacher response to the number of their students that have families that experience financial struggle

t_misbehavior

teacher response to frequency that student misbehavior interferes with teaching (0 = never; 1 = rarely; 2 = occasionally; 3 = frequently)

t_years_experience

teacher's number of years of teaching experience

score_pct_change

percent change in scores before and after the program

Source

These data correspond to the following study: Ariel Kalil, Susan Mayer, Philip Oreopoulos (2020). Closing the word gap with Big Word Club: Evaluating the Impact of a Tech-Based Early Childhood Vocabulary Program. Data was obtained through the was obtained through the Inter-university Consortium for Political and Social Research (ICPSR) https://www.openicpsr.org/openicpsr/project/117330/version/V1/view/.


Capital Bikeshare Bike Ridership (Registered and Casual Riders)

Description

Data on ridership among registered members and casual users of the Capital Bikeshare service in Washington, D.C..

Usage

bike_users

Format

A data frame with 534 daily observations, 267 each for registered riders and casual riders, and 13 variables:

date

date of observation

season

fall, spring, summer, or winter

year

the year of the date

month

the month of the date

day_of_week

the day of the week

weekend

whether or not the date falls on a weekend (TRUE or FALSE)

holiday

whether or not the date falls on a holiday (yes or no)

temp_actual

raw temperature (degrees Fahrenheit)

temp_feel

what the temperature feels like (degrees Fahrenheit)

humidity

humidity level (percentage)

windspeed

wind speed (miles per hour)

weather_cat

weather category (categ1 = pleasant, categ2 = moderate, categ3 = severe)

user

rider type (casual or registered)

rides

number of bikeshare rides

Source

Fanaee-T, Hadi and Gama, Joao (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset/


Capital Bikeshare Bike Ridership

Description

Data on ridership among registered members of the Capital Bikeshare service in Washington, D.C..

Usage

bikes

Format

A data frame with 500 daily observations and 13 variables:

date

date of observation

season

fall, spring, summer, or winter

year

the year of the date

month

the month of the date

day_of_week

the day of the week

weekend

whether or not the date falls on a weekend (TRUE or FALSE)

holiday

whether or not the date falls on a holiday (yes or no)

temp_actual

raw temperature (degrees Fahrenheit)

temp_feel

what the temperature feels like (degrees Fahrenheit)

humidity

humidity level (percentage)

windspeed

wind speed (miles per hour)

weather_cat

weather category (categ1 = pleasant, categ2 = moderate, categ3 = severe)

rides

number of bikeshare rides

Source

Fanaee-T, Hadi and Gama, Joao (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset


Bird Counts Data

Description

Bird count data collected between the years 1921 and 2017, in late December, by birdwatchers in the Ontario, Canada area. The data was made available by the Bird Studies Canada website and distributed through the R for Data Science TidyTuesday project.

Usage

bird_counts

Format

A data frame with 18706 rows and 7 variables. Each row represents observations for the given bird species in the given year.

year

year of data collection

species

scientific name of observed bird species

species_latin

latin name of observed bird species

count

number of birds observed

hours

total person-hours of observation period

count_per_hour

count divided by hours

count_per_week

count_per_hour multiplied by 168 hours per week

Source

https://github.com/rfordatascience/tidytuesday/blob/master/data/2019/2019-06-18/bird_counts.csv/.


Book Banning Data

Description

The book banning data was collected by Fast and Hegland as part of a course project at St Olaf College, and distributed with "Broadening Your Statistical Horizons" by Legler and Roback. This data set includes the features and outcomes for 931 book challenges (ie. requests to ban a book) made in the US between 2000 and 2010. Information on the books being challenged and the characteristics of these books were collected from the American Library Society. State-level demographic information and political leanings were obtained from the US Census Bureau and Cook Political Report, respectively. Due to an outlying large number of challenges, book challenges made in the state of Texas were omitted.

Usage

book_banning

Format

A data frame with 931 rows and 17 variables. Each row represents a single book challenge within the given state and date.

title

title of book being challenged

book_id

identifier for the book

author

author of the book

date

date of the challenge

year

year of the challenge

removed

whether or not the challenge was successful (the book was removed)

explicit

whether the book was challenged for sexually explicit material

antifamily

whether the book was challenged for anti-family material

occult

whether the book was challenged for occult material

language

whether the book was challenged for inapropriate language

lgbtq

whether the book was challenged for LGBTQ material

violent

whether the book was challenged for violent material

state

US state in which the challenge was made

political_value_index

Political Value Index of the state (negative = leans Republican, 0 = neutral, positive = leans Democrat)

median_income

median income in the state, relative to the average state median income

hs_grad_rate

high school graduation rate, in percent, relative to the average state high school graduation rate

college_grad_rate

college graduation rate, in percent, relative to the average state college graduation rate

Source

Shannon Fast and Thomas Hegland (2011). Book Challenges: A Statistical Examination. Project for Statistics 316-Advanced Statistical Modeling, St. Olaf College. Julie Legler and Paul Roback (2019). Broadening Your Statistical Horizons: Generalized Linear Models and Multilevel Models. https://bookdown.org/roback/bookdown-bysh/. https://github.com/proback/BeyondMLR/blob/master/data/bookbanningNoTex.csv/


Cherry Blossom Running Race

Description

A sub-sample of outcomes for the annual Cherry Blossom Ten Mile race in Washington, D.C.. This sub-sample was taken from the complete Cherry data in the mdsr package.

Usage

cherry_blossom_sample

Format

A data frame with 252 Cherry Blossom outcomes and 7 variables:

runner

a unique identifier for the runner

age

age of the runner

net

time to complete the race, from starting line to finish line (minutes)

gun

time between the official start of the of race and the finish line (minutes)

year

year of the race

previous

the number of previous years in which the subject ran in the race

Source

Data in the original Cherry data set were obtained from https://www.cherryblossom.org/post-race/race-results/.


Posterior Classification Summaries

Description

Given a set of observed data including a binary response variable y and an rstanreg model of y, this function returns summaries of the model's posterior classification quality. These summaries include a confusion matrix as well as estimates of the model's sensitivity, specificity, and overall accuracy.

Usage

classification_summary(model, data, cutoff = 0.5)

Arguments

model

an rstanreg model object with binary y

data

data frame including the variables in the model, both response y and predictors x

cutoff

probability cutoff to classify a new case as positive (0.5 is the default)

Value

a list

Examples

x <- rnorm(20)
z <- 3*x
prob <- 1/(1+exp(-z))
y <- rbinom(20, 1, prob)
example_data <- data.frame(x = x, y = y)
example_model <- rstanarm::stan_glm(y ~ x, data = example_data, family = binomial)
classification_summary(model = example_model, data = example_data, cutoff = 0.5)

Cross-Validated Posterior Classification Summaries

Description

Given a set of observed data including a binary response variable y and an rstanreg model of y, this function returns cross validated estimates of the model's posterior classification quality: sensitivity, specificity, and overall accuracy. For hierarchical models of class lmerMod, the folds are comprised by collections of groups, not individual observations.

Usage

classification_summary_cv(model, data, group, k, cutoff = 0.5)

Arguments

model

an rstanreg model object with binary y

data

data frame including the variables in the model, both response y (0 or 1) and predictors x

group

a character string representing the name of the factor grouping variable, ie. random effect (only used for hierarchical models)

k

the number of folds to use for cross validation

cutoff

probability cutoff to classify a new case as positive

Value

a list

Examples

x <- rnorm(20)
z <- 3*x
prob <- 1/(1+exp(-z))
y <- rbinom(20, 1, prob)
example_data <- data.frame(x = x, y = y)
example_model <- rstanarm::stan_glm(y ~ x, data = example_data, family = binomial)
classification_summary_cv(model = example_model, data = example_data, k = 2, cutoff = 0.5)

Himalayan Climber Data

Description

A sub-sample of the Himalayan Database distributed through the R for Data Science TidyTuesday project. This dataset includes information on the results and conditions for various Himalayan climbing expeditions. Each row corresponds to a single member of a climbing expedition team.

Usage

climbers_sub

Format

A data frame with 2076 observations (1 per climber) and 22 variables:

expedition_id

unique expedition identifier

member_id

unique climber identifier

peak_id

unique identifier of the expedition's destination peak

peak_name

name of the expedition's destination peak

year

year of expedition

season

season of expedition (Autumn, Spring, Summer, Winter)

sex

climber gender identity which the database oversimplifies to a binary category

age

climber age

citizenship

climber citizenship

expedition_role

climber's role in the expedition (eg: Co-Leader)

hired

whether the climber was a hired member of the expedition

highpoint_metres

the destination peak's highpoint (metres)

success

whether the climber successfully reached the destination

solo

whether the climber was on a solo expedition

oxygen_used

whether the climber utilized supplemental oxygen

died

whether the climber died during the expedition

death_cause
death_height_metres
injured

whether the climber was injured on the expedition

injury_type
injury_height_metres
count

number of climbers in the expedition

height_metres

height of the peak in meters

first_ascent_year

the year of the first recorded summit of the peak (though not necessarily the actual first summit!)

Source

Original source: https://www.himalayandatabase.com/. Complete dataset distributed by: https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-09-22/.


Coffee Ratings Data

Description

A sub-set of data on coffee bean ratings / quality originally collected by James LeDoux (jmzledoux) and distributed through the R for Data Science TidyTuesday project.

Usage

coffee_ratings

Format

A data frame with 1339 batches of coffee beans and 27 variables on each batch.

owner

farm owner

farm_name

farm where beans were grown

country_of_origin

country where farm is

mill

where beans were processed

in_country_partner

country of coffee partner

altitude_low_meters

lowest altitude of the farm

altitude_high_meters

highest altitude of the farm

altitude_mean_meters

average altitude of the farm

number_of_bags

number of bags tested

bag_weight

weight of each tested bag

species

bean species

variety

bean variety

processing_method

how beans were processed

aroma

bean aroma grade

flavor

bean flavor grade

aftertaste

bean aftertaste grade

acidity

bean acidity grade

body

bean body grade

balance

bean balance grade

uniformity

bean uniformity grade

clean_cup

bean clean cup grade

sweetness

bean sweetness grade

moisture

bean moisture grade

category_one_defects

count of category one defects

category_two_defects

count of category two defects

color

bean color

total_cup_points

total bean rating (0 – 100)

Source

https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv.


Coffee Ratings Data

Description

A sub-set of data on coffee bean ratings / quality originally collected by James LeDoux (jmzledoux) and distributed through the R for Data Science TidyTuesday project. This is a simplified version of the coffee_ratings data.

Usage

coffee_ratings_small

Format

A data frame with 636 batches of coffee beans and 11 variables on each batch.

farm_name

farm where beans were grown

total_cup_points

total bean rating (0 – 100)

aroma

bean aroma grade

flavor

bean flavor grade

aftertaste

bean aftertaste grade

acidity

bean acidity grade

body

bean body grade

balance

bean balance grade

uniformity

bean uniformity grade

sweetness

bean sweetness grade

moisture

bean moisture grade

Source

https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-07/coffee_ratings.csv.


LGBTQ+ Rights Laws by State

Description

Data on the number of LGBTQ+ equality laws (as of 2019) and demographics in each U.S. state.

Usage

equality_index

Format

A data frame with 50 observations, one per state, and 6 variables:

state

state name

region

region in which the state falls

gop_2016

percent of the 2016 presidential election vote earned by the Republican ("GOP") candidate

laws

number of LGBTQ+ rights laws (as of 2019)

historical

political leaning of the state over time (gop = Republican, dem = Democrat, swing = swing state)

percent_urban

percent of state's residents that live in urban areas (by the 2010 census)

Source

Data on LGBTQ+ laws were obtained from Warbelow, Sarah, Courtnay Avant, and Colin Kutney (2020). 2019 State Equality Index. Washington, DC. Human Rights Campaign Foundation. https://assets2.hrc.org/files/assets/resources/HRC-SEI-2019-Report.pdf?_ga=2.148925686.1325740687.1594310864-1928808113.1594310864&_gac=1.213124768.1594312278.EAIaIQobChMI9dP2hMzA6gIVkcDACh21GgLEEAAYASAAEgJiJvD_BwE/. Data on urban residency obtained from https://www.icip.iastate.edu/tables/population/urban-pct-states/.


A collection of 150 news articles

Description

A dataset containing data behind the study "FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media" https://arxiv.org/abs/1809.01286. The news articles in this dataset were posted to Facebook in September 2016, in the run-up to the U.S. presidential election.

Usage

fake_news

Format

A data frame with 150 rows and 6 variables:

title

The title of the news article

text

Text of the article

url

Hyperlink for the article

authors

Authors of the article

type

Binary variable indicating whether the article presents fake or real news(fake, real)

title_words

Number of words in the title

text_words

Number of words in the text

title_char

Number of characters in the title

text_char

Number of characters in the text

title_caps

Number of words that are all capital letters in the title

text_caps

Number of words that are all capital letters in the text

title_caps_percent

Percent of words that are all capital letters in the title

text_caps_percent

Percent of words that are all capital letters in the text

title_excl

Number of characters that are exclamation marks in the title

text_excl

Number of characters that are exclamation marks in the text

title_excl_percent

Percent of characters that are exclamation marks in the title

text_excl_percent

Percent of characters that are exclamation marks in the text

title_has_excl

Binary variable indicating whether the title of the article includes an exlamation point or not(TRUE, FALSE)

anger

Percent of words that are associated with anger

anticipation

Percent of words that are associated with anticipation

disgust

Percent of words that are associated with disgust

fear

Percent of words that are associated with fear

joy

Percent of words that are associated with joy

sadness

Percent of words that are associated with sadness

surprise

Percent of words that are associated with surprise

trust

Percent of words that are associated with trust

negative

Percent of words that have negative sentiment

positive

Percent of words that have positive sentiment

text_syllables

Number of syllables in text

text_syllables_per_word

Number of syllables per word in text

Source

Shu, K., Mahudeswaran, D., Wang, S., Lee, D. and Liu, H. (2018) FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media


Football Brain Measurements

Description

Brain measurements for football and non-football players as provided in the Lock5 package

Usage

football

Format

A data frame with 75 observations and 5 variables:

group

control = no football, fb_no_concuss = football player but no concussions, fb_concuss = football player with concussion history

years

Number of years a person played football

volume

Total hippocampus volume, in cubic centimeters

Source

Singh R, Meier T, Kuplicki R, Savitz J, et al., "Relationship of Collegiate Football Experience and Concussion With Hippocampal Volume and Cognitive Outcome," JAMA, 311(18), 2014


Hotel Bookings Data

Description

A random subset of the data on hotel bookings originally collected by Antonio, Almeida and Nunes (2019) and distributed through the R for Data Science TidyTuesday project.

Usage

hotel_bookings

Format

A data frame with 1000 hotel bookings and 32 variables on each booking.

hotel

"Resort Hotel" or "City Hotel"

is_canceled

whether the booking was cancelled

lead_time

number of days between booking and arrival

arrival_date_year

year of scheduled arrival

arrival_date_month

month of scheduled arrival

arrival_date_week_number

week of scheduled arrival

arrival_date_day_of_month

day of month of scheduled arrival

stays_in_weekend_nights

number of reserved weekend nights

stays_in_week_nights

number of reserved week nights

adults

number of adults in booking

children

number of children

babies

number of babies

meal

whether the booking includes breakfast (BB = bed & breakfast), breakfast and dinner (HB = half board), or breakfast, lunch, and dinner (FB = full board)

country

guest's country of origin

market_segment

market segment designation (eg: TA = travel agent, TO = tour operator)

distribution_channel

booking distribution channel (eg: TA = travel agent, TO = tour operator)

is_repeated_guest

whether or not booking was made by a repeated guest

previous_cancellations

guest's number of previous booking cancellations

previous_bookings_not_canceled

guest's number of previous bookings that weren't cancelled

reserved_room_type

code for type of room reserved by guest

assigned_room_type

code for type of room assigned by hotel

booking_changes

number of changes made to the booking

deposit_type

No Deposit, Non Refund, Refundable

agent

booking travel agency

company

booking company

days_in_waiting_list

number of days the guest waited for booking confirmation

customer_type

Contract, Group, Transient, Transient-party (a transient booking tied to another transient booking)

average_daily_rate

average hotel cost per day

required_car_parking_spaces

number of parking spaces the guest needed

total_of_special_requests

number of guest special requests

reservation_status

Canceled, Check-Out, No-Show

reservation_status_date

when the guest cancelled or checked out

Source

Nuno Antonio, Ana de Almeida, and Luis Nunes (2019). "Hotel booking demand datasets." Data in Brief (22): 41-49. https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-02-11/hotels.csv/.


Loon Count Data

Description

Loon count data collected from the year 2000 to 2017, in late December, by birdwatchers in the Ontario, Canada area. The data was made available by the Bird Studies Canada website and distributed through the R for Data Science TidyTuesday project. A more complete data set with a larger selection of birds can be found in the bird_counts data in the bayesrules package.

Usage

loons

Format

A data frame with 18 rows and 5 variables. Each row represents loon observations in the given year.

year

year of data collection

count

number of loons observed

hours

total person-hours of observation period

count_per_hour

count divided by hours

count_per_100

count_per_hour multiplied by 100 hours

Source

https://github.com/rfordatascience/tidytuesday/blob/master/data/2019/2019-06-18/bird_counts.csv.


Museum of Modern Art (MoMA) data

Description

The Museum of Modern Art data includes information about the individual artists included in the collection of the Museum of Modern Art in New York City. It does not include information about works for artist collectives or companies. The data was made available by MoMA itself and downloaded in December 2020.

Usage

moma

Format

A data frame with 10964 rows and 11 variables. Each row represents an individual artist in the MoMA collection.

artist

name

country

country of origin

birth

year of birth

death

year of death

alive

whether or not the artist was living at the time of data collection (December 2020)

genx

whether or not the artist is Gen X or younger, ie. born during 1965 or after

gender

gender identity (as perceived by MoMA employees)

department

MoMA department in which the artist's works most frequently appear

count

number of the artist's works in the MoMA collection

year_acquired_min

first year MoMA acquired one of the artist's works

year_acquired_max

most recent year MoMA acquired one of the artist's works

Source

https://github.com/MuseumofModernArt/collection/blob/master/Artworks.csv/.


Museum of Modern Art (MoMA) data sample

Description

A random sample of 100 artists represented in the Museum of Modern Art in New York City. The data was made available by MoMA itself and downloaded in December 2020. It does not include information about artist collectives or companies.

Usage

moma_sample

Format

A data frame with 100 rows and 10 variables. Each row represents an individual artist in the MoMA collection.

artist

name

country

country of origin

birth

year of birth

death

year of death

alive

whether or not the artist was living at the time of data collection (December 2020)

genx

whether or not the artist is Gen X or younger, ie. born during 1965 or after

gender

gender identity (as perceived by MoMA employees)

count

number of the artist's works in the MoMA collection

year_acquired_min

first year MoMA acquired one of the artist's works

year_acquired_max

most recent year MoMA acquired one of the artist's works

Source

https://github.com/MuseumofModernArt/collection/blob/master/Artworks.csv/.


Posterior Classification Summaries for a Naive Bayes model

Description

Given a set of observed data including a categorical response variable y and a naiveBayes model of y, this function returns summaries of the model's posterior classification quality. These summaries include a confusion matrix as well as an estimate of the model's overall accuracy.

Usage

naive_classification_summary(model, data, y)

Arguments

model

a naiveBayes model object with categorical y

data

data frame including the variables in the model

y

a character string indicating the y variable in data

Value

a list

Examples

data(penguins_bayes, package = "bayesrules")
example_model <- e1071::naiveBayes(species ~ bill_length_mm, data = penguins_bayes)
naive_classification_summary(model = example_model, data = penguins_bayes, y = "species")

Cross-Validated Posterior Classification Summaries for a Naive Bayes model

Description

Given a set of observed data including a categorical response variable y and a naiveBayes model of y, this function returns a cross validated confusion matrix by which to assess the model's posterior classification quality.

Usage

naive_classification_summary_cv(model, data, y, k = 10)

Arguments

model

a naiveBayes model object with categorical y

data

data frame including the variables in the model

y

a character string indicating the y variable in data

k

the number of folds to use for cross validation

Value

a list

Examples

data(penguins_bayes, package = "bayesrules")
example_model <- e1071::naiveBayes(species ~ bill_length_mm, data = penguins_bayes)
naive_classification_summary_cv(model = example_model, data = penguins_bayes, y = "species", k = 2)

Penguins Data

Description

Data on penguins in the Palmer Archipelago, originally collected by Gordan etal and distributed through the penguins data in the palmerpenguins package. In addition to the original penguins data is a variable above_average_weight.

Usage

penguins_bayes

Format

A data frame with 344 penguins and 9 variables on each.

species

species (Adelie, Chinstrap, Gentoo)

island

home island (Biscoe, Dream, Torgersen)

year

year of observation

bill_length_mm

length of bill (mm)

bill_depth_mm

depth of bill (mm)

flipper_length_mm

length of flipper (mm)

body_mass_g

body mass (g)

above_average_weight

whether or not the body mass exceeds 4200g (TRUE or FALSE)

sex

male or female

Source

Gorman KB, Williams TD, and Fraser WR (2014). Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (Genus Pygoscelis). PLoS ONE, 9(3).


Plot a Beta Model for π\pi

Description

Plots the probability density function (pdf) for a Beta(alpha, beta) model of variable π\pi.

Usage

plot_beta(alpha, beta, mean = FALSE, mode = FALSE)

Arguments

alpha, beta

positive shape parameters of the Beta model

mean, mode

a logical value indicating whether to display the model mean and mode

Value

A density plot for the Beta model.

Examples

plot_beta(alpha = 1, beta = 12, mean = TRUE, mode = TRUE)

Plot a Beta-Binomial Bayesian Model

Description

Consider a Beta-Binomial Bayesian model for parameter π\pi with a Beta(alpha, beta) prior on π\pi and Binomial likelihood with n trials and y successes. Given information on the prior (alpha and data) and data (y and n), this function produces a plot of any combination of the corresponding prior pdf, scaled likelihood function, and posterior pdf. All three are included by default.

Usage

plot_beta_binomial(
  alpha,
  beta,
  y = NULL,
  n = NULL,
  prior = TRUE,
  likelihood = TRUE,
  posterior = TRUE
)

Arguments

alpha, beta

positive shape parameters of the prior Beta model

y

observed number of successes

n

observed number of trials

prior

a logical value indicating whether the prior model should be plotted

likelihood

a logical value indicating whether the scaled likelihood should be plotted

posterior

a logical value indicating whether posterior model should be plotted

Value

a ggplot

Examples

plot_beta_binomial(alpha = 1, beta = 13, y = 25, n = 50)
plot_beta_binomial(alpha = 1, beta = 13, y = 25, n = 50, posterior = FALSE)

Plot a Beta Model with Credible Interval

Description

Plots the probability density function (pdf) for a Beta(alpha, beta) model of variable π\pi with markings indicating a credible interval for π\pi.

Usage

plot_beta_ci(alpha, beta, ci_level = 0.95)

Arguments

alpha, beta

positive shape parameters of the Beta model

ci_level

credible interval level

Value

A density plot for the Beta model

Examples

plot_beta_ci(alpha = 7, beta = 12, ci_level = 0.80)

Plot a Binomial Likelihood Function

Description

Plots the Binomial likelihood function for variable π\pi given y observed successes in a series of n Binomial trials.

Usage

plot_binomial_likelihood(y, n, mle = FALSE)

Arguments

y

number of successes

n

number of trials

mle

a logical value indicating whether maximum likelihood estimate of π\pi, y/n, should be plotted

Value

a ggplot

Examples

plot_binomial_likelihood(y = 3, n = 10, mle = TRUE)

Plot a Gamma Model for λ\lambda

Description

Plots the probability density function (pdf) for a Gamma(shape, rate) model of variable λ\lambda.

Usage

plot_gamma(shape, rate, mean = FALSE, mode = FALSE)

Arguments

shape

non-negative shape parameter of the Gamma model

rate

non-negative rate parameter of the Gamma model

mean, mode

a logical value indicating whether to display the model mean and mode

Value

A density plot for the Gamma model.

Examples

plot_gamma(shape = 2, rate = 11, mean = TRUE, mode = TRUE)

Plot a Gamma-Poisson Bayesian Model

Description

Consider a Gamma-Poisson Bayesian model for rate parameter λ\lambda with a Gamma(shape, rate) prior on λ\lambda and a Poisson likelihood for the data. Given information on the prior (shape and rate) and data (the sample size n and sum_y), this function produces a plot of any combination of the corresponding prior pdf, scaled likelihood function, and posterior pdf. All three are included by default.

Usage

plot_gamma_poisson(
  shape,
  rate,
  sum_y = NULL,
  n = NULL,
  prior = TRUE,
  likelihood = TRUE,
  posterior = TRUE
)

Arguments

shape

non-negative shape parameter of the Gamma prior

rate

non-negative rate parameter of the Gamma prior

sum_y

sum of observed data values for the Poisson likelihood

n

number of observations for the Poisson likelihood

prior

a logical value indicating whether the prior model should be plotted.

likelihood

a logical value indicating whether the scaled likelihood should be plotted.

posterior

a logical value indicating whether posterior model should be plotted.

Value

a ggplot

Examples

plot_gamma_poisson(shape = 100, rate = 20, sum_y = 39, n = 6)
plot_gamma_poisson(shape = 100, rate = 20, sum_y = 39, n = 6, posterior = FALSE)

Plot a Normal Model for μ\mu

Description

Plots the probability density function (pdf) for a Normal(mean, sd^2) model of variable μ\mu.

Usage

plot_normal(mean, sd)

Arguments

mean

mean parameter of the Normal model

sd

standard deviation parameter of the Normal model

Value

a ggplot

Examples

plot_normal(mean = 3.5, sd = 0.5)

Plot a Normal Likelihood Function

Description

Plots the Normal likelihood function for variable μ\mu given a vector of Normal data y.

Usage

plot_normal_likelihood(y, sigma = NULL)

Arguments

y

vector of observed data

sigma

optional value for assumed standard deviation of y. by default, this is calculated by the sample standard deviation of y.

Value

a ggplot of Normal likelihood

Examples

plot_normal_likelihood(y = rnorm(50, mean = 10, sd = 2), sigma = 1.5)

Plot a Normal-Normal Bayesian model

Description

Consider a Normal-Normal Bayesian model for mean parameter μ\mu with a N(mean, sd^2) prior on μ\mu and a Normal likelihood for the data. Given information on the prior (mean and sd) and data (the sample size n, mean y_bar, and standard deviation sigma), this function produces a plot of any combination of the corresponding prior pdf, scaled likelihood function, and posterior pdf. All three are included by default.

Usage

plot_normal_normal(
  mean,
  sd,
  sigma = NULL,
  y_bar = NULL,
  n = NULL,
  prior = TRUE,
  likelihood = TRUE,
  posterior = TRUE
)

Arguments

mean

mean of the Normal prior

sd

standard deviation of the Normal prior

sigma

standard deviation of the data, or likelihood standard deviation

y_bar

sample mean of the data

n

sample size of the data

prior

a logical value indicating whether the prior model should be plotted

likelihood

a logical value indicating whether the scaled likelihood should be plotted

posterior

a logical value indicating whether posterior model should be plotted

Value

a ggplot

Examples

plot_normal_normal(mean = 0, sd = 3, sigma= 4, y_bar = 5, n = 3)
plot_normal_normal(mean = 0, sd = 3, sigma= 4, y_bar = 5, n = 3, posterior = FALSE)

Plot a Poisson Likelihood Function

Description

Plots the Poisson likelihood function for variable λ\lambda given a vector of Poisson counts y.

Usage

plot_poisson_likelihood(y, lambda_upper_bound = 10)

Arguments

y

vector of observed Poisson counts

lambda_upper_bound

upper bound for lambda values to display on x-axis

Value

a ggplot of Poisson likelihood

Examples

plot_poisson_likelihood(y = c(4, 2, 7), lambda_upper_bound = 10)

Pop vs Soda vs Coke

Description

Results of a volunteer survey on how people around the U.S. refer to fizzy cola drinks. The options are "pop", "soda", "coke", or "other".

Usage

pop_vs_soda

Format

A data frame with 374250 observations, one per survey respondent, and 4 variables:

state

the U.S. state in which the respondent resides

region

region in which the state falls (as defined by the U.S. Census)

word_for_cola

how the respondent refers to fizzy cola drinks

pop

whether or not the respondent refers to fizzy cola drinks as "pop"

Source

The survey responses were obtained at https://popvssoda.com/ which is maintained by Alan McConchie.


Posterior Predictive Summaries

Description

Given a set of observed data including a quantitative response variable y and an rstanreg model of y, this function returns 4 measures of the posterior prediction quality. Median absolute prediction error (mae) measures the typical difference between the observed y values and their posterior predictive medians (stable = TRUE) or means (stable = FALSE). Scaled mae (mae_scaled) measures the typical number of absolute deviations (stable = TRUE) or standard deviations (stable = FALSE) that observed y values fall from their predictive medians (stable = TRUE) or means (stable = FALSE). within_50 and within_90 report the proportion of observed y values that fall within their posterior prediction intervals, the probability levels of which are set by the user.

Usage

prediction_summary(
  model,
  data,
  prob_inner = 0.5,
  prob_outer = 0.95,
  stable = FALSE
)

Arguments

model

an rstanreg model object with quantitative y

data

data frame including the variables in the model, both response y and predictors x

prob_inner

posterior predictive interval probability (a value between 0 and 1)

prob_outer

posterior predictive interval probability (a value between 0 and 1)

stable

TRUE returns the number of absolute deviations and FALSE returns the standard deviations that observed y values fall from their predictive medians

Value

a tibble

Examples

example_data <- data.frame(x = sample(1:100, 20))
example_data$y <- example_data$x*3 + rnorm(20, 0, 5)
example_model <- rstanarm::stan_glm(y ~ x,  data = example_data)
prediction_summary(example_model, example_data, prob_inner = 0.6, prob_outer = 0.80, stable = TRUE)

Cross-Validated Posterior Predictive Summaries

Description

Given a set of observed data including a quantitative response variable y and an rstanreg model of y, this function returns 4 cross-validated measures of the model's posterior prediction quality: Median absolute prediction error (mae) measures the typical difference between the observed y values and their posterior predictive medians (stable = TRUE) or means (stable = FALSE). Scaled mae (mae_scaled) measures the typical number of absolute deviations (stable = TRUE) or standard deviations (stable = FALSE) that observed y values fall from their predictive medians (stable = TRUE) or means (stable = FALSE). within_50 and within_90 report the proportion of observed y values that fall within their posterior prediction intervals, the probability levels of which are set by the user. For hierarchical models of class lmerMod, the folds are comprised by collections of groups, not individual observations.

Usage

prediction_summary_cv(
  data,
  group,
  model,
  k,
  prob_inner = 0.5,
  prob_outer = 0.95
)

Arguments

data

data frame including the variables in the model, both response y and predictors x

group

a character string representing the name of the factor grouping variable, ie. random effect (only used for hierarchical models)

model

an rstanreg model object with quantitative y

k

the number of folds to use for cross validation

prob_inner

posterior predictive interval probability (a value between 0 and 1)

prob_outer

posterior predictive interval probability (a value between 0 and 1)

Value

list

Examples

example_data <- data.frame(x = sample(1:100, 20))
example_data$y <- example_data$x*3 + rnorm(20, 0, 5)
example_model <- rstanarm::stan_glm(y ~ x,  data = example_data)
prediction_summary_cv(model = example_model, data = example_data, k = 2)

Cards Against Humanity's Pulse of the Nation Survey

Description

Cards Against Humanity's "Pulse of the Nation" project (https://thepulseofthenation.com/) conducted monthly polls into people's social and political views, as well as some silly things. This data includes responses to a subset of questions included in the poll conducted in September 2017.

Usage

pulse_of_the_nation

Format

A data frame with observations on 1000 survey respondents with 15 variables:

income

income in \$1000s

age

age in years

party

political party affiliation

trump_approval

approval level of Donald Trump's job performance

education

maximum education level completed

robots

opinion of how likely their job is to be replaced by robots within 10 years

climate_change

belief in climate change

transformers

the number of Transformers film the respondent has seen

science_is_honest

opinion of whether scientists are generally honest and serve the public good

vaccines_are_safe

opinion of whether vaccines are safe and protect children from disease

books

number of books read in the past year

ghosts

whether or not they believe in ghosts

fed_sci_budget

respondent's estimate of the percentage of the federal budget that is spent on scientific research

earth_sun

belief about whether the earth is always farther away from the sun in winter than in summer (TRUE or FALSE)

wise_unwise

whether the respondent would rather be wise but unhappy, or unwise but happy

Source

https://thepulseofthenation.com/downloads/201709-CAH_PulseOfTheNation_Raw.csv


Sample Mode

Description

Calculate the sample mode of vector x.

Usage

sample_mode(x)

Arguments

x

vector of sample data

Value

sample mode

Examples

sample_mode(rbeta(100, 2, 7))

Spotify Song Data

Description

A sub-sample of the Spotify song data originally collected by Kaylin Pavlik (kaylinquest) and distributed through the R for Data Science TidyTuesday project.

Usage

spotify

Format

A data frame with 350 songs (or tracks) and 23 variables:

track_id

unique song identifier

title

song name

artist

song artist

popularity

song popularity from 0 (low) to 100 (high)

album_id

id of the album on which the song appears

album_name

name of the album on which the song appears

album_release_date

when the album was released

playlist_name

Spotify playlist on which the song appears

playlist_id

unique playlist identifier

genre

genre of the playlist

subgenre

subgenre of the playlist

danceability

a score from 0 (not danceable) to 100 (danceable) based on features such as tempo, rhythm, etc.

energy

a score from 0 (low energy) to 100 (high energy) based on features such as loudness, timbre, entropy, etc.

key

song key

loudness

song loudness (dB)

mode

0 (minor key) or 1 (major key)

speechiness

a score from 0 (non-speechy tracks) to 100 (speechy tracks)

acousticness

a score from 0 (not acoustic) to 100 (very acoustic)

instrumentalness

a score from 0 (not instrumental) to 100 (very instrumental)

liveness

a score from 0 (no live audience presence on the song) to 100 (strong live audience presence on the song)

valence

a score from 0 (the song is more negative, sad, angry) to 100 (the song is more positive, happy, euphoric)

tempo

song tempo (beats per minute)

duration_ms

song duration (ms)

Source

https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-01-21/spotify_songs.csv/.


Summarize a Beta Model for π\pi

Description

Summarizes the expected value, variance, and mode of a Beta(alpha, beta) model for variable π\pi.

Usage

summarize_beta(alpha, beta)

Arguments

alpha, beta

positive shape parameters of the Beta model

Value

a summary table

Examples

summarize_beta(alpha = 1, beta = 15)

Summarize a Beta-Binomial Bayesian model

Description

Consider a Beta-Binomial Bayesian model for parameter π\pi with a Beta(alpha, beta) prior on π\pi and Binomial likelihood with n trials and y successes. Given information on the prior (alpha and data) and data (y and n), this function summarizes the mean, mode, and variance of the prior and posterior Beta models of π\pi.

Usage

summarize_beta_binomial(alpha, beta, y = NULL, n = NULL)

Arguments

alpha, beta

positive shape parameters of the prior Beta model

y

number of successes

n

number of trials

Value

a summary table

Examples

summarize_beta_binomial(alpha = 1, beta = 15, y = 25, n = 50)

Summarize a Gamma Model for λ\lambda

Description

Summarizes the expected value, variance, and mode of a Gamma(shape, rate) model for variable λ\lambda.

Usage

summarize_gamma(shape, rate)

Arguments

shape

positive shape parameter of the Gamma model

rate

positive rate parameter of the Gamma model

Value

a summary table

Examples

summarize_gamma(shape = 1, rate = 15)

Summarize the Gamma-Poisson Model

Description

Consider a Gamma-Poisson Bayesian model for rate parameter λ\lambda with a Gamma(shape, rate) prior on λ\lambda and a Poisson likelihood for the data. Given information on the prior (shape and rate) and data (the sample size n and sum_y), this function summarizes the mean, mode, and variance of the prior and posterior Gamma models of λ\lambda.

Usage

summarize_gamma_poisson(shape, rate, sum_y = NULL, n = NULL)

Arguments

shape

positive shape parameter of the Gamma prior

rate

positive rate parameter of the Gamma prior

sum_y

sum of observed data values for the Poisson likelihood

n

number of observations for the Poisson likelihood

Value

data frame

Examples

summarize_gamma_poisson(shape = 3, rate = 4, sum_y = 7, n = 12)

Summarize a Normal-Normal Bayesian model

Description

Consider a Normal-Normal Bayesian model for mean parameter μ\mu with a N(mean, sd^2) prior on μ\mu and a Normal likelihood for the data. Given information on the prior (mean and sd) and data (the sample size n, mean y_bar, and standard deviation sigma), this function summarizes the mean, mode, and variance of the prior and posterior Normal models of μ\mu.

Usage

summarize_normal_normal(mean, sd, sigma = NULL, y_bar = NULL, n = NULL)

Arguments

mean

mean of the Normal prior

sd

standard deviation of the Normal prior

sigma

standard deviation of the data, or likelihood standard deviation

y_bar

sample mean of the data

n

sample size of the data

Value

data frame

Examples

summarize_normal_normal(mean = 2.3, sd = 0.3, sigma = 5.1, y_bar = 128.5, n = 20)

Voice Pitch Data

Description

Voice pitch data collected by Winter and Grawunder (2012). In an experiment, subjects participated in role-playing dialog under various conditions, while researchers monitored voice pitch (Hz). The conditions spanned different scenarios (eg: making an appointment, asking for a favor) and different attitudes to use in the scenario (polite or informal).

Usage

voices

Format

A data frame with 84 rows and 4 variables. Each row represents a single observation for the given subject.

subject

subject identifier

scenario

context of the dialog (encoded as A, B, ..., G)

attitude

whether the attitude to use in dialog was polite or informal

pitch

average voice pitch (Hz)

Source

Winter, B., & Grawunder, S. (2012). The Phonetic Profile of Korean Formal and Informal Speech Registers. Journal of Phonetics, 40, 808-815. https://bodo-winter.net/data_and_scripts/POP.csv. https://bodo-winter.net/tutorial/bw_LME_tutorial2.pdf.


Weather Data for 3 Australian Cities

Description

A sub-sample of daily weather information from the weatherAUS data in the rattle package for three Australian cities: Wollongong, Hobart, and Uluru.

Usage

weather_australia

Format

A data frame with 300 daily observations and 22 variables from 3 Australian weather stations:

location

one of three weather stations

mintemp

minimum temperature (degrees Celsius)

maxtemp

maximum temperature (degrees Celsius)

rainfall

rainfall (mm)

windgustdir

direction of strongest wind gust

windgustspeed

speed of strongest wind gust (km/h)

winddir9am

direction of wind gust at 9am

winddir3pm

direction of wind gust at 3pm

windspeed9am

wind speed at 9am (km/h)

windspeed3pm

wind speed at 3pm (km/h)

humidity9am

humidity level at 9am (percent)

humidity3pm

humidity level at 3pm (percent)

pressure9am

atmospheric pressure at 9am (hpa)

pressure3pm

atmospheric pressure at 3pm (hpa)

temp9am

temperature at 9am (degrees Celsius)

temp3pm

temperature at 3pm (degrees Celsius)

raintoday

whether or not it rained today (Yes or No)

risk_mm

the amount of rain today (mm)

raintomorrow

whether or not it rained the next day (Yes or No)

year

the year of the date

month

the month of the date

day_of_year

the day of the year

Source

Data in the original weatherAUS data set were obtained from https://www.bom.gov.au/climate/data/. Copyright Commonwealth of Australia 2010, Bureau of Meteorology.


Weather Data for Perth, Australia

Description

A sub-sample of daily weather information on Perth, Australia from the weatherAUS data in the rattle package.

Usage

weather_perth

Format

A data frame with 1000 daily observations and 21 variables:

mintemp

minimum temperature (degrees Celsius)

maxtemp

maximum temperature (degrees Celsius)

rainfall

rainfall (mm)

windgustdir

direction of strongest wind gust

windgustspeed

speed of strongest wind gust (km/h)

winddir9am

direction of wind gust at 9am

winddir3pm

direction of wind gust at 3pm

windspeed9am

wind speed at 9am (km/h)

windspeed3pm

wind speed at 3pm (km/h)

humidity9am

humidity level at 9am (percent)

humidity3pm

humidity level at 3pm (percent)

pressure9am

atmospheric pressure at 9am (hpa)

pressure3pm

atmospheric pressure at 3pm (hpa)

temp9am

temperature at 9am (degrees Celsius)

temp3pm

temperature at 3pm (degrees Celsius)

raintoday

whether or not it rained today (Yes or No)

risk_mm

the amount of rain today (mm)

raintomorrow

whether or not it rained the next day (Yes or No)

year

the year of the date

month

the month of the date

day_of_year

the day of the year

Source

Data in the original weatherAUS data set were obtained from https://www.bom.gov.au/climate/data/. Copyright Commonwealth of Australia 2010, Bureau of Meteorology.


Weather Data for 2 Australian Cities

Description

A sub-sample of daily weather information from the weatherAUS data in the rattle package for two Australian cities, Wollongong and Uluru. The weather_australia data in the bayesrules package combines this data with a third city

Usage

weather_WU

Format

A data frame with 200 daily observations and 22 variables from 2 Australian weather stations:

location

one of two weather stations

mintemp

minimum temperature (degrees Celsius)

maxtemp

maximum temperature (degrees Celsius)

rainfall

rainfall (mm)

windgustdir

direction of strongest wind gust

windgustspeed

speed of strongest wind gust (km/h)

winddir9am

direction of wind gust at 9am

winddir3pm

direction of wind gust at 3pm

windspeed9am

wind speed at 9am (km/h)

windspeed3pm

wind speed at 3pm (km/h)

humidity9am

humidity level at 9am (percent)

humidity3pm

humidity level at 3pm (percent)

pressure9am

atmospheric pressure at 9am (hpa)

pressure3pm

atmospheric pressure at 3pm (hpa)

temp9am

temperature at 9am (degrees Celsius)

temp3pm

temperature at 3pm (degrees Celsius)

raintoday

whether or not it rained today (Yes or No)

risk_mm

the amount of rain today (mm)

raintomorrow

whether or not it rained the next day (Yes or No)

year

the year of the date

month

the month of the date

day_of_year

the day of the year

Source

Data in the original weatherAUS data set were obtained from https://www.bom.gov.au/climate/data. Copyright Commonwealth of Australia 2010, Bureau of Meteorology.