Appendix B — Package References

C Packages & Functions Reference

This table consolidates the packages and commands used throughout the book, what each command does, and where it is first introduced.

Package Command What it does First introduced
base R <- Assigns values to objects for later use. Introduction to R
base R c() Combines multiple values into a single vector. Introduction to R
base R : Creates integer sequences (e.g., 1:10). Introduction to R
base R [] Indexes and subsets elements from vectors or data frames. Introduction to R
base R $ Accesses or creates columns within a data frame. Introduction to R
base R sqrt() Computes square roots. Introduction to R
base R log() / log10() Computes natural and base-10 logarithms. Introduction to R
base R round() Rounds numeric values to a specified number of digits. Introduction to R
base R class() Identifies the data type (class) of an object. Introduction to R
base R length() Returns the number of elements in a vector. Introduction to R
base R factor() Converts character data into categorical (factor) variables. Introduction to R
base R levels() Displays the levels associated with a factor. Introduction to R
base R data.frame() Combines vectors into a tabular data structure. Introduction to R
base R head() / tail() Displays the first or last rows of a dataset. Introduction to R
base R str() Displays the internal structure and data types of a dataset. Introduction to R
base R summary() Produces descriptive summaries of variables or model results. Introduction to R
base R table() Creates frequency tables for categorical data. Introduction to R
base R nrow() / ncol() Returns the number of rows or columns in a dataset. Introduction to R
base R colnames() Displays or modifies column names of a data frame. Introduction to R
base R read.csv() Imports CSV files into R as data frames. Introduction to R
base R getwd() / setwd() Gets or sets the current working directory. Introduction to R
base R install.packages() Installs packages from CRAN. Introduction to R
base R library() Loads an installed package into the current R session. Introduction to R
base R ?function_name Accesses built-in help documentation for a function. Introduction to R
magrittr %>% Passes the result of one operation into the next. Introduction to tidyverse
dplyr select() Chooses specific columns from a dataset. Introduction to tidyverse
dplyr filter() Keeps rows that meet logical conditions. Introduction to tidyverse
dplyr arrange() Orders rows based on column values. Introduction to tidyverse
dplyr mutate() Creates or modifies columns. Introduction to tidyverse
dplyr rename() Renames columns using new_name = old_name. Introduction to tidyverse
dplyr distinct() Returns unique rows or value combinations. Introduction to tidyverse
dplyr if_else() Creates values based on a binary condition. Introduction to tidyverse
dplyr case_when() Applies multiple conditional rules. Introduction to tidyverse
base R is.na() Identifies missing (NA) values. Introduction to tidyverse
tidyr drop_na() Removes rows containing missing values. Introduction to tidyverse
dplyr count() Counts observations by group. Introduction to tidyverse
dplyr group_by() Groups data for grouped operations. Introduction to tidyverse
dplyr summarise() Computes summary statistics for groups. Introduction to tidyverse
dplyr n() Returns group size within summarise(). Introduction to tidyverse
base R sessionInfo() Displays information about the current R session, including loaded packages. Introduction to tidyverse
dplyr inner_join() Performs a SQL-style inner join, keeping only rows that match in both datasets. Comparing Two Groups
tidyr pivot_longer() Converts data from wide format to long format. Comparing Two Groups
tidyr pivot_wider() Converts data from long format back to wide format. Comparing Two Groups
base R rbind() Combines multiple data frames by binding rows together. Comparing Two Groups
base R merge() Joins two data frames together based on a shared key variable. Comparing Two Groups
base R mean() Calculates the average of numeric values. Comparing Two Groups
stats t.test() Tests whether two group means differ significantly. Comparing Two Groups
stats cor() Computes Pearson correlation coefficients. Correlation Analysis
stats cor.test() Computes and tests correlations. Correlation Analysis
base R pairs() Creates a scatterplot matrix. Correlation Analysis
GGally ggpairs() Enhanced scatterplot matrix with correlations. Correlation Analysis
ppcor pcor.test() Computes partial correlations. Correlation Analysis
base R ifelse() Recodes variables conditionally. Correlation Analysis
base R set.seed() Ensures reproducibility when generating random data. Comparing Multiple Means
stats rnorm() Generates random values from a normal distribution. Comparing Multiple Means
stats aov() Fits ANOVA models. Comparing Multiple Means
supernova supernova() Displays ANOVA results in structured tables. Comparing Multiple Means
stats TukeyHSD() Performs post-hoc pairwise comparisons. Comparing Multiple Means
base R plot() Visualizes post-hoc comparison results. Comparing Multiple Means
AICcmodavg aictab() Compares models using AIC. Comparing Multiple Means
base R xtabs() Constructs contingency tables using a formula interface. Analyzing Categorical Data
janitor tabyl() Creates clean contingency tables. Analyzing Categorical Data
janitor adorn_percentages() Converts counts to percentages. Analyzing Categorical Data
janitor adorn_ns() Displays counts and percentages together. Analyzing Categorical Data
janitor clean_names() Cleans names of an object. Correlations
stats chisq.test() Performs Chi-Square tests of independence. Analyzing Categorical Data
gmodels CrossTable() Detailed cross-tabulations. Analyzing Categorical Data
pheatmap pheatmap() Heatmap visualization of residuals or contributions. Analyzing Categorical Data
rcompanion cramerV() Measures association strength between categorical variables. Analyzing Categorical Data
stats lm() Fits linear regression models. Linear Regression
broom tidy() Tidies model coefficients. Linear Regression
broom glance() Extracts model-level statistics. Linear Regression
lmtest bptest() Tests heteroscedasticity. Linear Regression
stats AIC() Compares regression models. Linear Regression
stats step() Performs stepwise model selection. Linear Regression
stats glm() Fits generalized linear models, including logistic regression (binomial family). Logistic Regression
base R exp() Converts log-odds to odds ratios. Logistic Regression
caTools sample.split() Splits data into training/testing sets. Logistic Regression
caret confusionMatrix() Evaluates classification performance. Logistic Regression
pscl pR2() Computes pseudo R² values. Logistic Regression
caret varImp() Assesses predictor importance. Logistic Regression
car vif() Detects multicollinearity. Logistic Regression
pROC roc() Builds ROC curves. Logistic Regression
pROC auc() Computes area under the ROC curve (AUC). Logistic Regression
knitr knitr::opts_chunk$set() Sets global chunk options in R Markdown. Reproducible Reporting
knitr kable() Creates formatted tables for reports. Reproducible Reporting
nycOpenData nyc311() Downloads NYC 311 Service Request data from NYC Open Data. Reproducible Reporting
citation("base")
To cite R in publications use:

  R Core Team (2025). _R: A Language and Environment for Statistical
  Computing_. R Foundation for Statistical Computing, Vienna, Austria.
  <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2025},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.
citation("ggplot2")
To cite ggplot2 in publications, please use

  H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
  Springer-Verlag New York, 2016.

A BibTeX entry for LaTeX users is

  @Book{,
    author = {Hadley Wickham},
    title = {ggplot2: Elegant Graphics for Data Analysis},
    publisher = {Springer-Verlag New York},
    year = {2016},
    isbn = {978-3-319-24277-4},
    url = {https://ggplot2.tidyverse.org},
  }
citation("dplyr")
To cite package 'dplyr' in publications use:

  Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
  Grammar of Data Manipulation_. doi:10.32614/CRAN.package.dplyr
  <https://doi.org/10.32614/CRAN.package.dplyr>, R package version
  1.1.4, <https://CRAN.R-project.org/package=dplyr>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {dplyr: A Grammar of Data Manipulation},
    author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan},
    year = {2023},
    note = {R package version 1.1.4},
    url = {https://CRAN.R-project.org/package=dplyr},
    doi = {10.32614/CRAN.package.dplyr},
  }
citation("tidyr")
To cite package 'tidyr' in publications use:

  Wickham H, Vaughan D, Girlich M (2024). _tidyr: Tidy Messy Data_.
  doi:10.32614/CRAN.package.tidyr
  <https://doi.org/10.32614/CRAN.package.tidyr>, R package version
  1.3.1, <https://CRAN.R-project.org/package=tidyr>.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {tidyr: Tidy Messy Data},
    author = {Hadley Wickham and Davis Vaughan and Maximilian Girlich},
    year = {2024},
    note = {R package version 1.3.1},
    url = {https://CRAN.R-project.org/package=tidyr},
    doi = {10.32614/CRAN.package.tidyr},
  }