R Programming for Data Science, Student Edition

R 3570.52

($)

R Programming for Data Science

Course Specifications

Course Number: ELK94–025_rev1.0

Course Length: 5 days

Course Description

In our data-driven world, organizations need the right tools to extract valuable insights from that data. The R programming language is one of the tools at the forefront of data science. Its robust set of packages and statistical functions makes it a powerful choice for analyzing data, manipulating data, performing statistical tests on data, and creating predictive models from data. Likewise, R is notable for its strong data visualization tools, enabling you to create high-quality graphs and plots that are incredibly customizable.

This course will teach you the fundamentals of programming in R to get you started. It will also teach you how to use R to perform common data science tasks and achieve data-driven results for the business.

Course Objective: In this course, you will use R to perform common data science tasks.

You will:

  • Set up an R development environment and execute simple code.
  • Perform operations on atomic data types in R, including characters, numbers, and logicals.
  • Perform operations on data structures in R, including vectors, lists, and data frames.
  • Write conditional statements and loops.
  • Structure code for reuse with functions and packages.
  • Manage data by loading and saving datasets, manipulating data frames, and more.
  • Analyze data through exploratory analysis, statistical analysis, and more.
  • Create and format data visualizations using base R and ggplot2.
  • Create simple statistical models from data.

Target Student: This course is designed for students who want to learn the R programming language, particularly students who want to leverage R for data analysis and data science tasks in their organization. The course is also designed for students with an interest in applying statistics to real-world problems.

A typical student in this course should have several years of experience with computing technology, along with a proficiency in at least one other programming language.

Prerequisites: To ensure your success in this course, you should be comfortable with basic computer programming concepts, including but not limited to: syntax, data types, conditional statements, loops, and functions. You can obtain this level of skills and knowledge by taking the Logical Operations Introduction to Programming with Python® course.

You should also have at least a high-level understanding of fundamental data science concepts, including but not limited to: data engineering, data analysis, data storage, data visualization, and statistics. You can obtain this level of knowledge by taking the CertNexus DSBIZ™ (Exam DSZ-110): Data Science for Business Professionals course.

Hardware Requirements

For this course, you will need one computer for each student and one for the instructor. Each computer will need the following minimum hardware configurations:

  • 1 gigahertz (GHz) 64-bit (x64) processor.
  • 8 gigabytes (GB) of Random Access Memory (RAM).
  • 32 GB available storage space.
  • Monitor capable of a screen resolution of at least 1,024 × 768 pixels, at least a 256-color display, and a video adapter with at least 4 MB of memory.
  • Bootable DVD-ROM or USB drive.
  • Keyboard and mouse or a compatible pointing device.
  • Fast Ethernet (100 Mb/s) adapter or faster and cabling to connect to the classroom network.
  • IP addresses that do not conflict with other portions of your network.
  • Internet access (contact your local network administrator).
  • (Instructor computer only) A display system to project the instructor's computer screen.

Software Requirements

Each computer requires the following software:

  • Microsoft® Windows® 10 64-bit.
  • R version 4.1.1 (R-4.1.1-win.exe).
  • R is distributed with the course data files under version 2 of the GNU General Public License (GPL).
  • RStudio® Desktop version 2021.09.0-351 ( RStudio-2021.09.0-351).
  • RStudio is distributed with the course data files under version 3 of the Affero General Public License (AGPL).
  • If necessary, software for viewing the course slides. (Instructor machine only.)

Dataset

This course uses a modified version of a third-party dataset to demonstrate data science concepts. The dataset was retrieved from: https://www.kaggle.com/aungpyaeap/supermarket-sales.

Course Content

Lesson 1: Setting Up R and Executing Simple Code

Topic A: Set Up the R Development Environment
Topic B: Write R Statements

Lesson 2: Processing Atomic Data Types

Topic A: Process Characters
Topic B: Process Numbers
Topic C: Process Logicals

Lesson 3: Processing Data Structures

Topic A: Process Vectors
Topic B: Process Factors
Topic C: Process Data Frames
Topic D: Subset Data Structures

Lesson 4: Writing Conditional Statements and Loops

Topic A: Write Conditional Statements
Topic B: Write Loops

Lesson 5: Structuring Code for Reuse

Topic A: Define and Call Functions
Topic B: Apply Loop Functions
Topic C: Manage R Packages

Lesson 6: Managing Data in R

Topic A: Load Data
Topic B: Save Data
Topic C: Manipulate Data Frames Using Base R
Topic D: Manipulate Data Frames Using dplyr
Topic E: Handle Dates and Times

Lesson 7: Analyzing Data in R

Topic A: Examine Data
Topic B: Explore the Underlying Distribution of Data
Topic C: Identify Missing Values

Lesson 8: Visualizing Data in R

Topic A: Plot Data Using Base R Functions
Topic B: Plot Data Using ggplot2
Topic C: Format Plots in ggplot2
Topic D: Create Combination Plots

Lesson 9: Modeling Data in R

Topic A: Create Statistical Models in R
Topic B: Create Machine Learning Models in R

Appendix A: Handling Issues in Code

Appendix B: R Resources

Featured Learning