class: center, middle, inverse, title-slide # Introduction to DATA 606 ## Statistics & Probability for Data Analytics ### Jason Bryer, Ph.D. and Angela Lui, Ph.D. ### Fall 2025 --- # Agenda * About your instructor * Syllabus * Class meetups * Course Schedule * Assignments (how you will be graded) * Participation * Labs * Data Project * Exams * Software * The `DATA606` R Package * Using R Markdown --- # A little about me... * Assistant Professor at CUNY in Data Science and Information Systems * Principal Investigator for a Department of Education Grant to develop and test the Diagnostic Assessment and Achievement of College Skills ([www.DAACS.net](http://www.daacs.net)) * Authored over a dozen R packages including: * [likert](http://github.com/jbryer/likert) * [ShinyQDA](http://github.com/jbryer/ShinyQDA) * [DTedit](http://github.com/jbryer/DTedit) * [login](http://github.com/jbryer/login) * Specialize in propensity score methods. Three new methods/R packages developed include: * [multilevelPSA](http://github.com/jbryer/multilevelPSA) * [TriMatch](http://github.com/jbryer/TriMatch) * [PSAboot](http://github.com/jbryer/PSAboot) --- # Also a Father... <img src="images/BoysFall2019.jpg" width="65%" style="display: block; margin: auto;" /> --- # Runner... <table border='0' width='100%'><tr><td> <center><img src='images/2025DisneyMarathon.jpeg' height='450'></center> </td><td> <center><img src='images/2019NYCMarathon.jpg' height='450'></center> </td></tr></table> --- # And photographer. <img src="images/Sleeping_Empire.jpg" width="80%" style="display: block; margin: auto;" /> --- # A little about Angela... .pull-left[ <img alt='Angela Lui' src='images/Lui.jpg' height = '500' /> ] .pull-right[ <center> <br/> <img height='75' alt='NYU' src='images/NYU.png' /><br/> <img height='75' alt='Hunter' src='images/Hunter.png' /><br/> <img height='75' alt='UAlbany' src='images/UAlbany.png' /><br/> <img height='75' alt='Rutgers' src='images/Rutgers.png' /><br/> <img height='75' alt='CUNY SPS' src='images/CUNYSPS.png' /><br/> <img height='75' alt='DAACS', src='images/DAACS_Centered.png' /><br /> </center> ] --- # Teaching Experience * Introduction to Statistics in Social Sciences * Special Issues in Testing * Evaluation * Motivation in Education * Introduction to the Psychological Processing of Schooling * Educational Psychology in Adolescent Development --- # Homeowner .pull-left[ <img src='images/Lui_Home.jpg' height='450' /> ] .pull-right[ <img src='images/Lui_home2.png' height='450' /> ] --- .pull-left[ <img src='images/Lui_chickens.jpg' height='450' /> ] .pull-right[ <img src='images/Lui_cat.jpg' height='450' /> ] --- # Syllabus <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/blogdown.png" class="title-hex"> Syllabus and course materials are here: [https://fall2025.data606.net](https://fall2025.data606.net) The site is built using [Quarto](https://quarto.org) and hosted on [Github](https://github.com/jbryer/DATA606-2025-Fall). Each page of the site has a "Edit this page" link at the bottom right, use that to start a pull request on Github. We will use Brightspace primary for submitting assignments only. Please submit a PDF or link to the built HTML (e.g. Rpubs, [Github](http://htmlpreview.github.io/)) PDFs are preferred for the homework as there is some LaTeX formatting in the R markdown files. The `tineytex` R package helps with install LaTeX, but you can also install LaTeX using [MiKTeX](http://miktex.org) (for Windows) and [BasicTeX](http://www.tug.org/mactex/morepackages.html) (for Mac) See this page for more information: https://fall2025.data606.net/course-overview/software/ --- class: font90 # Meetups We will have meetups on Wednesday evenings at 8:00pm. Meetups will be recorded and made available the next day on the [course website](https://spring2024.data606.net/course-overview/meetups/). Though attending live is not strictly required, **I expect everyone to watch the lectures during the week.** I use the class meetups to convey important information and announcements. Very often I will cover some topics not in the textbook. Students who attend the meetups tend to do well on the assignments. **One Minute Papers** - Complete the one minute paper after each Meetup (whether you watch live or watch the recordings). It should take approximately one to two minutes to complete. This allows me to 1) verify you have attended/watch the meetup and 2) get feedback about what you learned and what you may still be unclear. .font60[ **Please note:** *Students who participate in this class with their camera on or use a profile image are agreeing to have their video or image recorded solely for the purpose of creating a record for students enrolled in the class to refer to, including those enrolled students who are unable to attend live. If you are unwilling to consent to have your profile or video image recorded, be sure to keep your camera off and do not use a profile image. Likewise, students who un-mute during class and participate orally are agreeing to have their voices recorded. If you are not willing to consent to have your voice recorded during class, you will need to keep your mute button activated and communicate exclusively using the "chat" feature, which allows students to type questions and comments live.* [Click here for CUNY's camera use policy](https://www.cuny.edu/wp-content/uploads/sites/4/page-assets/academics/faculty-affairs/Camera-Use-Guidance-for-Online-and-Hybrid-Courses_FINAL-JUNE-20-2024.pdf) ] --- # Schedule <table> <thead> <tr> <th style="text-align:left;"> Start </th> <th style="text-align:left;"> End </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Monday, August 25, 2025 </td> <td style="text-align:left;"> Sunday, August 31, 2025 </td> <td style="text-align:left;"> Chapter 1 - Intro to Data, R, and RStudio </td> </tr> <tr> <td style="text-align:left;"> Monday, September 01, 2025 </td> <td style="text-align:left;"> Sunday, September 14, 2025 </td> <td style="text-align:left;"> Chatper 2 - Summarizing Data </td> </tr> <tr> <td style="text-align:left;"> Monday, September 15, 2025 </td> <td style="text-align:left;"> Sunday, September 21, 2025 </td> <td style="text-align:left;"> Chapter 3 - Probability </td> </tr> <tr> <td style="text-align:left;"> Monday, September 22, 2025 </td> <td style="text-align:left;"> Sunday, September 28, 2025 </td> <td style="text-align:left;"> Chapter 4 - Distributions </td> </tr> <tr> <td style="text-align:left;"> Monday, September 29, 2025 </td> <td style="text-align:left;"> Sunday, October 05, 2025 </td> <td style="text-align:left;"> Chatper 5 - Foundation for Inference </td> </tr> <tr> <td style="text-align:left;"> Monday, October 06, 2025 </td> <td style="text-align:left;"> Sunday, October 12, 2025 </td> <td style="text-align:left;"> Chapter 6 - Inference for Categorical Data </td> </tr> <tr> <td style="text-align:left;"> Monday, October 13, 2025 </td> <td style="text-align:left;"> Sunday, October 19, 2025 </td> <td style="text-align:left;"> Chapter 7 - Inference for Numerical Data </td> </tr> <tr> <td style="text-align:left;"> Monday, October 20, 2025 </td> <td style="text-align:left;"> Sunday, November 02, 2025 </td> <td style="text-align:left;"> Chapter 8 - Linear Regression </td> </tr> <tr> <td style="text-align:left;"> Monday, November 03, 2025 </td> <td style="text-align:left;"> Sunday, November 09, 2025 </td> <td style="text-align:left;"> Chapter 9 - Logistic Regression </td> </tr> <tr> <td style="text-align:left;"> Monday, November 24, 2025 </td> <td style="text-align:left;"> Sunday, November 30, 2025 </td> <td style="text-align:left;"> Thanksgiving </td> </tr> <tr> <td style="text-align:left;"> Monday, December 01, 2025 </td> <td style="text-align:left;"> Sunday, December 07, 2025 </td> <td style="text-align:left;"> Intro to Bayesian Analysis </td> </tr> <tr> <td style="text-align:left;"> Monday, December 08, 2025 </td> <td style="text-align:left;"> Sunday, December 14, 2025 </td> <td style="text-align:left;"> Final Exam </td> </tr> </tbody> </table> --- # Textbooks <img src="images/hex/openintro.png" class="title-hex"> .pull-left[ Diez, D.M., Barr, C.D., & Çetinkaya-Rundel, M. (2019). *OpenIntro Statistics (4th Ed)*. .font70[ This will be our primary textbook for most of the semesters. Our goal is to cover all the chapters. ] .center[ <a href = "https://github.com/jbryer/DATA606spring2024/blob/master/Resources/Textbooks/os4.pdf"><img src = 'images/openintro.jpeg' alt = 'Open Intro Statistics' height = '375px' /></a> ] ] .pull-right[ Navarro, D. (2018, version 0.6). *Learning Statistics with R* .font70[ This textbooks has a chapter on Bayesian analysis that we will use at the end of the semester. ] .center[ <a href = "https://github.com/jbryer/DATA606spring2024/blob/master/Resources/Textbooks/lsr-0.6.pdf"><img src = 'images/lsr.png' alt = 'Learning Statistics with R' height = '375px' /></a> ] ] --- # Assignments * Participation (10%) * [DAACS](https://fall2025.data606.net/assignments/daacs) * [One Minute Papers](https://fall2025.data606.net/assignments/participation) * [Labs](https://fall2025.data606.net/assignments/labs) (35%) * Labs are designed to introduce to you doing statistics with R. * Answer the questions in the main text as well as the "On Your Own" section. * [Data Project](https://fall2025.data606.net/assignments/project) (30%) * This allows you to analyze a dataset of your choosing. Projects will be shared with the class. This provides an opportunity for everyone to see different approaches to analyzing different datasets. * [Exams](https://fall2025.data606.net/assignments/exams/) * Midterm (10%) * Final exam (15%) --- # Use of Artificial Intelligence (AI) First, AI is a marketing term. I prefer to be more specific regarding what we are doing: 1. Machine Learning (ML) - This course, along with IS382, will provide the foundations for how ML algorithms work. Generally speaking, the goal is to predict some known (and sometimes unknown in the case of unsupervised learning models) outcome. 2. Large Language Models (LLM) - This is often what people mean when they say AI. This includes producsts like ChatGPT, Antrhopic, Google Gemini, etc. LLMs generate text, images, videos, etc. from a prompt. The goal of this course is for *you to develop the foundation knowledge and skills to do statistics.* Using chat bots to do the assignments subverts this goal. **The content generated by LLMs is often wrong!** If you use LLMs to assist in completing the assignments, **you must include the prompt and response in your submission**. --- # Communication * Slack Channel: https://cuny-msds.slack.com * [Click here to join the group](https://cuny-msds.slack.com/archives/C08TB8BTZ8T) * Email: [jason.bryer@cuny.edu](mailto:jason.bryer@cuny.edu) * Phone/Zoom: Please email to schedule a time to meet. * Office hours by appointment. --- # Software <img src="images/hex/tinytex.png" class="title-hex"><img src="images/hex/RStudio.png" class="title-hex"><img src="images/hex/rmarkdown.png" class="title-hex"> This is an applied statistics course so we will make extensive use of the [R statistical programming language](https://www.r-project.org). Install [R](https://cran.r-project.org) and [RStudio](https://rstudio.com) on your own computer. I encourage everyone to do this at some point by the end of the semester. I have instructions on the course website here: https://fall2025.data606.net/course-overview/software/ You will also need to have [LaTeX](https://www.latex-project.org) installed as well in order to create PDFs. The [`tinytex`](https://yihui.org/tinytex/) R package helps with this process: ``` install.packages('tinytex') tinytex::install_tinytex() ``` --- # DATA 606 Package <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/devtools.png" class="title-hex"> The [`DATA606`](https://github.com/jbryer/DATA606) R package contains many data sets and functions we will use throughout the semester. It also has a `startLab` function that will copy each of the labs to your current working directory. Use the following commands to install the package (only necessary once per R installation): ``` remotes::install_github('jbryer/DATA606') ``` To start the first lab... ``` DATA606::startLab('Lab1') ``` This will copy the R markdown file and any supporting files to your current working directory. Use the "Knit" button in R Studio to build a PDF of the document. --- # Next steps... <img src="images/hex/DAACS.png" class="title-hex"> Before Wednesday (August 31st): * Complete this Google form: https://forms.gle/6RmyywJ97L7iJBYb9 * Go to https://cuny.daacs.net and complete the self-regulated learning assessment * [Join the Slack channel](https://cuny-msds.slack.com/archives/C08TB8BTZ8T) Then: * Start Lab 1 (due August 31st) --- class: inverse, right, middle, hide-logo <!--img src="images/hex/DATA606.png" width="150px"/--> # Good luck with the semester! [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> jason.bryer@cuny.edu](mailto:jason.bryer@cuny.edu) [<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"></path></svg> cuny-msds.slack.com](https://cuny-msds.slack.com) [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @jbryer](https://github.com/jbryer) [<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M433 179.11c0-97.2-63.71-125.7-63.71-125.7-62.52-28.7-228.56-28.4-290.48 0 0 0-63.72 28.5-63.72 125.7 0 115.7-6.6 259.4 105.63 289.1 40.51 10.7 75.32 13 103.33 11.4 50.81-2.8 79.32-18.1 79.32-18.1l-1.7-36.9s-36.31 11.4-77.12 10.1c-40.41-1.4-83-4.4-89.63-54a102.54 102.54 0 0 1-.9-13.9c85.63 20.9 158.65 9.1 178.75 6.7 56.12-6.7 105-41.3 111.23-72.9 9.8-49.8 9-121.5 9-121.5zm-75.12 125.2h-46.63v-114.2c0-49.7-64-51.6-64 6.9v62.5h-46.33V197c0-58.5-64-56.6-64-6.9v114.2H90.19c0-122.1-5.2-147.9 18.41-175 25.9-28.9 79.82-30.8 103.83 6.1l11.6 19.5 11.6-19.5c24.11-37.1 78.12-34.8 103.83-6.1 23.71 27.3 18.4 53 18.4 175z"></path></svg> @jbryer@vis.social](https://vis.social/@jbryer) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> fall2025.data606.net](https://fall2025.data606.net)