An Introduction to R: Mastering the Basics

An overall introduction to R programming, covering the basics of objects, operations, functions, and data types.
R Basic
Functions
Data Types
Author

NING LI

Published

Nov 27, 2022

Introduction

R is a language and environment for statistical computing and graphics 1. It provides a comprehensive set of tools and libraries that make it a preferred choice among data scientists, statisticians, and researcher.

In this blog, I will mainly explore the basics of R programming, including objects, operations, functions, and data types.

However, before start this blog, please make sure you already set up R development environment.

For more information about how to setting up, you can read Posts page: how to install R and Rstudio.

How to Install R Packages?

Packages in R are collections of functions, sample data, and code in a well-defined format. They extend the capabilities of R by providing additional functions for a variety of statistical techniques, graphical devices, data manipulation, and more2.

I am highly recommend using Rstudio for package installation over R itself. The provided compatible with R and Rstudio. Rstudio offers a user-friendly graphical interface for package management, enabling effortless browsing, installation, loading, and updates of packages3.

Once a package has been installed, it is technically added onto R (even if you use RStudio to install it), which is why packages must be re-installed when R is updated. However, since we use R through RStudio, any packages that are installed can be used in both R and RStudio, regardless of which one was used to install the packages.

To install a package in Rstudio:

  • Install packages from Rstudio Interface (allows auto-complete):

    • Click on the “Packages” tab in the lower right window.
    • Click “Install” and a window will pop up.
    • Type the name of package you want to install (e.g., “ggplot2”) and click “Install”.
  • Install packages from R console:

    • install.packages("pkg_name")
    • The option dependencies = TRUE, which tells R to install the other things that are necessary for package or packages to run smoothly.

Once installed, you can use library(pkg_name) to load a package each time you want to use it.

1install.packages("dslabs")
2install.packages(c("tidyverse", "dslabs"))
3installed.packages()
4library("dslabs")
1
To install a single package
2
To install two packages at the same time
3
To see the list of all installed packages
4
To load a package called dslabs

Running Commands While Editing Scripts

Running commands while editing scripts in an IDE like Rstudio is an essential skill for efficient and effective programming. This practice allows you to test your code as you write it, helping you spot and correct any errors or bugs early in the development process4.

When working in Rstudio, you work area is typically divided into several panels:

  • The Script window, where you write longer pieces of code.

  • The Console window, where R code is executed.

  • Other window for view variables, history, files, plots, packages, and help.

Running a Single Line of Code

  • place the cursor anywhere on the line;

  • press `Ctrl + Enter` on a Windows/Linux machine;

  • press `Cmd + Enter` on a Mac;

  • Or, click “source” on the editor pane.

Running an Entire Script

  • place the cursor anywhere on the line;

  • press `Ctrl+Shift+Enter` on a Windows/Linux machine;

  • press `Command+Shift+Return` on Mac;

  • Or click “Source” on the editor pane.

R Basic

R is a versatile and powerful language due to its diverse range of objects, operations, functions ,and data type.

Objects

In R, everything is an object. Object are entitles that hold data, which can be manipulated by invoking functions5. Here are the common types of objects:

  1. Vectors: A vector is a basic data structure in R. It contains elements of the same type. For instance, numeric_vector <- c(1, 2, 3) creates a numeric vector.

  2. Matrices: A matrix is a two-dimensional data structure where elements are arranged in rows and columns, and all elements are of the same type.

  3. Lists: A list is an R-object that can contain many different types of elements inside it like vectors, functions, and even another list.

  4. Data frames: Data frames are used for storing data tables. They are a list of vectors of equal length. For example, you might create a data frame to hold a dataset for data analysis.

Operations

Operations in R refer to the tasks that can be performed on R objects. They’re a fundamental part of the language, allowing for the manipulation, comparison, and assignment of data6.

Arithmetic Operations

Arithmetic operations perform mathematical calculations. Here are some examples:

  • Addition (+): 5 + 3 returns 8

  • Subtraction (-): 5 - 3 returns 2

  • Multiplication (*): 5 * 3 returns 15

  • Division (/): 5 / 3 returns 1.6666667

  • Exponentiation (^ or **): 5^3 or 5**3 returns 125

  • Modulus (%%): 5 %% 3 returns 2 (remainder of the division)

  • Integer Division (%/%): 5 %/% 3 returns 1 (quotient of the division)

Relational Operations

Relational operations compare values and return a logical output (TRUE or FALSE). They’re often used in conditional statements. Here are some examples:

  • Less than (<): 5 < 3 returns FALSE

  • Greater than (>): 5 > 3 returns TRUE

  • Less than or equal to (<=): 5 <= 3 returns FALSE

  • Greater than or equal to (>=): 5 >= 3 returns TRUE

  • Equals to (==): 5 == 3 returns FALSE

  • Not equal to (!=): 5 != 3 returns TRUE

Logical Operations

Logical operations perform boolean logic on values. Here are some examples:

  • And (&): Returns TRUE if both operands are TRUE. For example, TRUE & FALSE returns FALSE.

  • Or (|): Returns TRUE if either operand is TRUE. For example, TRUE | FALSE returns TRUE.

  • Not (!): Negates the value of the operand. For example, !TRUE returns FALSE.

  • Elementwise And (&&): Similar to &, but only evaluates the first element of vectors and ignores the rest.

  • Elementwise Or (||): Similar to |, but only evaluates the first element of vectors and ignores the rest.

Assignment Operations

Assignment operations store a value in a variable:

  • <-: This is the most common assignment operator in R. For example, x <- 5 assigns the value 5 to the variable x.

  • =: Similar to <-, but usually used within functions. For example, mean(x = 1:5) computes the mean of the numbers from 1 to 5.

  • <<-: This is the global assignment operator. It assigns a value to a variable in the global environment, even from within a function.

Functions

Functions are the backbone of any programming language, and R is no exception. Functions are sets of instructions that perform a task, and R has a multitude of built-in functions, as well as the capability for users to define their own.

Built-in Functions

R provides many built-in functions, which perform predefined tasks. Here are a few examples:

  • mean(x): Computes the arithmetic mean of a numeric vector x.

  • sum(x): Calculates the sum of all the values in x.

  • max(x): Finds the maximum value in x.

  • min(x): Finds the minimum value in x.

  • sd(x): Calculates the standard deviation of x.

  • length(x): Returns the number of elements in x.

  • str(x): Provides a compact, human-readable description of x.

  • help(x) or ?function_name: To access help files documents of functions

For each of these, x would be the argument, and it is typically a vector of some kind.

User-Defined Functions

In addition to the built-in functions, R allows you to define your own functions. This is particularly useful for tasks you need to perform frequently. User-defined functions are created with the function() command. Here’s an example:

1# Define a function that calculates the average of squares of two numbers
avg_of_squares <- function(a, b) {
    return((a^2 + b^2)/2)
}
1
To make your code more readable, use intuitive variable names and include comments (using the “#” symbol) to remind yourself why you wrote a particular line of code.

In this function, a and b are the arguments. This function calculates the squares of a and b, sums them, and then divides by 2 to find the average. It then returns this result.

Special Functions

There are some special types of functions in R, such as:

  • Anonymous functions: These are functions that are defined without a name. They are used when a function is only needed once, often in the context of apply-type functions. For example, sapply(1:5, function(x) x^2) applies the anonymous function function(x) x^2 to the vector 1:5.

  • Primitive functions: These are basic functions in R that are implemented in C for efficiency. Examples include basic arithmetic operations (+, -, *, /) and others like sum(), prod(), and mean().

Data Types

Data types determine the kind of operations that can be performed on data, and R offers several basic data types7.

Note

The code data("dataset_name") and data(dataset_name) do the same thing. The code will work regardless of whether the quotes are present. It is a bit faster to leave out the quotes (as we do in the Code at the bottom of this page), so that is usually what we recommend, but it is your choice.

Numeric

The numeric (or double) data type is used for real numbers (numbers with decimal points). Numeric is the default computational data type in R. If you assign a number to a variable without explicitly declaring its type, R will interpret it as a numeric. For example, x <- 7.14 assigns a numeric value of 7.14 to x.

Integer

The integer data type is used for integer numbers (whole numbers without decimal points). In R, you declare an integer by appending an L to the integer value. For example, x <- 7L assigns an integer value of 7 to x.

Logical

The logical data type is used for boolean values: TRUE and FALSE. Logical data types are often the result of logical operations. For example, the operation 5 > 3 returns TRUE, which is a logical value.

Character

The character data type is used for text or string data. To create a character string in R, you enclose the text in either single or double quotes. For example, x <- “Hello, R!” assigns a character string of Hello, R! to x.

Complex

The complex data type is used for complex numbers, which have both real and imaginary parts. In R, complex numbers are represented as x + yi, where x is the real part and y is the imaginary part. For example, x <- 3 + 2i assigns a complex value of 3 + 2i to x.

Raw

The raw data type is used for “raw” bytes. It can hold a stream of raw bytes, which are displayed as hexadecimal. This is not commonly used unless you are doing something fairly advanced, like writing a package to interface to other software or reading binary data directly from a connection.

Knowledge Extension
flowchart LR
  A{Data Type}--->B[numeric]
  A{Data Type}--->C[integer]
  A{Data Type}--->D[complex]
  A{Data Type}--->E[character]
  A{Data Type}--->F[logical]
  A{Data Type}--->G[Raw]
  
linkStyle default stroke:red
linkStyle 0 stroke:green
linkStyle 3 stroke:blue
linkStyle 4 stroke:black
Figure 1: Data Type in R
# Loading the dslabs package and the murders dataset
library(dslabs) 
data(murders) 

# Determining that the murders dataset is of the "data frame" class
class(murders) 

# Finding out more about the structure of the object
str(murders) 

# Showing the first 6 lines of the dataset
head(murders) 

# Using the accessor operator $ to obtain the population column
murders$population 

# Displaying the variable names in the murders dataset
names(murders) 

# Determining how many entries are in a vector
pop <- murders$population 
length(pop) 

# Vectors can be of class numeric and character
class(pop) 
class(murders$state) 

# Logical vectors are either TRUE or FALSE
z <- 3 == 2 
z 
class(z) 

# Factors are another type of class
class(murders$region) 

# Obtaining the levels of a factor
levels(murders$region) 

Conclusion

In summary, R is an incredibly robust and flexible tool for a myriad of statistical and data analysis tasks. This guide should provide you with a solid foundation to start exploring and utilizing its power. Happy coding!

Social share button
Back to top

Footnotes

  1. https://www.r-project.org/about.html↩︎

  2. https://www.tutorialspoint.com/r/r_packages.htm↩︎

  3. https://www.theanalysisfactor.com/the-advantages-of-rstudio/↩︎

  4. https://support.posit.co/hc/en-us/articles/200484448-Editing-and-Executing-Code-in-the-RStudio-IDE↩︎

  5. https://www.geeksforgeeks.org/r-objects/↩︎

  6. https://www.w3schools.com/r/r_operators.asp↩︎

  7. https://www.programiz.com/r/data-types↩︎

Reuse