An overall introduction to R programming, covering the basics of objects, operations, functions, and data types.
R Basic
Functions
Data Types
Author
NING LI
Published
Nov 27, 2022
Introduction
R is a language and environment for statistical computing and graphics 1. It provides a comprehensive set of tools and libraries that make it a preferred choice among data scientists, statisticians, and researcher.
In this blog, I will mainly explore the basics of R programming, including objects, operations, functions, and data types.
However, before start this blog, please make sure you already set up R development environment.
Packages in R are collections of functions, sample data, and code in a well-defined format. They extend the capabilities of R by providing additional functions for a variety of statistical techniques, graphical devices, data manipulation, and more2.
I am highly recommend using Rstudio for package installation over R itself. The provided compatible with R and Rstudio. Rstudio offers a user-friendly graphical interface for package management, enabling effortless browsing, installation, loading, and updates of packages3.
Once a package has been installed, it is technically added onto R (even if you use RStudio to install it), which is why packages must be re-installed when R is updated. However, since we use R through RStudio, any packages that are installed can be used in both R and RStudio, regardless of which one was used to install the packages.
To install a package in Rstudio:
Install packages from Rstudio Interface (allows auto-complete):
Click on the “Packages” tab in the lower right window.
Click “Install” and a window will pop up.
Type the name of package you want to install (e.g., “ggplot2”) and click “Install”.
Install packages from R console:
install.packages("pkg_name")
The option dependencies = TRUE, which tells R to install the other things that are necessary for package or packages to run smoothly.
Once installed, you can use library(pkg_name) to load a package each time you want to use it.
Running commands while editing scripts in an IDE like Rstudio is an essential skill for efficient and effective programming. This practice allows you to test your code as you write it, helping you spot and correct any errors or bugs early in the development process4.
When working in Rstudio, you work area is typically divided into several panels:
The Script window, where you write longer pieces of code.
The Console window, where R code is executed.
Other window for view variables, history, files, plots, packages, and help.
Running a Single Line of Code
place the cursor anywhere on the line;
press `Ctrl + Enter` on a Windows/Linux machine;
press `Cmd + Enter` on a Mac;
Or, click “source” on the editor pane.
Running an Entire Script
place the cursor anywhere on the line;
press `Ctrl+Shift+Enter` on a Windows/Linux machine;
press `Command+Shift+Return` on Mac;
Or click “Source” on the editor pane.
R Basic
R is a versatile and powerful language due to its diverse range of objects, operations, functions ,and data type.
Objects
In R, everything is an object. Object are entitles that hold data, which can be manipulated by invoking functions5. Here are the common types of objects:
Vectors: A vector is a basic data structure in R. It contains elements of the same type. For instance, numeric_vector <- c(1, 2, 3) creates a numeric vector.
Matrices: A matrix is a two-dimensional data structure where elements are arranged in rows and columns, and all elements are of the same type.
Lists: A list is an R-object that can contain many different types of elements inside it like vectors, functions, and even another list.
Data frames: Data frames are used for storing data tables. They are a list of vectors of equal length. For example, you might create a data frame to hold a dataset for data analysis.
Operations
Operations in R refer to the tasks that can be performed on R objects. They’re a fundamental part of the language, allowing for the manipulation, comparison, and assignment of data6.
Arithmetic Operations
Arithmetic operations perform mathematical calculations. Here are some examples:
Addition (+): 5 + 3 returns 8
Subtraction (-): 5 - 3 returns 2
Multiplication (*): 5 * 3 returns 15
Division (/): 5 / 3 returns 1.6666667
Exponentiation (^ or **): 5^3 or 5**3 returns 125
Modulus (%%): 5 %% 3 returns 2 (remainder of the division)
Integer Division (%/%): 5 %/% 3 returns 1 (quotient of the division)
Relational Operations
Relational operations compare values and return a logical output (TRUE or FALSE). They’re often used in conditional statements. Here are some examples:
Less than (<): 5 < 3 returns FALSE
Greater than (>): 5 > 3 returns TRUE
Less than or equal to (<=): 5 <= 3 returns FALSE
Greater than or equal to (>=): 5 >= 3 returns TRUE
Equals to (==): 5 == 3 returns FALSE
Not equal to (!=): 5 != 3 returns TRUE
Logical Operations
Logical operations perform boolean logic on values. Here are some examples:
And (&): Returns TRUE if both operands are TRUE. For example, TRUE & FALSE returns FALSE.
Or (|): Returns TRUE if either operand is TRUE. For example, TRUE | FALSE returns TRUE.
Not (!): Negates the value of the operand. For example, !TRUE returns FALSE.
Elementwise And (&&): Similar to &, but only evaluates the first element of vectors and ignores the rest.
Elementwise Or (||): Similar to |, but only evaluates the first element of vectors and ignores the rest.
Assignment Operations
Assignment operations store a value in a variable:
<-: This is the most common assignment operator in R. For example, x <- 5 assigns the value 5 to the variable x.
=: Similar to <-, but usually used within functions. For example, mean(x = 1:5) computes the mean of the numbers from 1 to 5.
<<-: This is the global assignment operator. It assigns a value to a variable in the global environment, even from within a function.
Functions
Functions are the backbone of any programming language, and R is no exception. Functions are sets of instructions that perform a task, and R has a multitude of built-in functions, as well as the capability for users to define their own.
Built-in Functions
R provides many built-in functions, which perform predefined tasks. Here are a few examples:
mean(x): Computes the arithmetic mean of a numeric vector x.
sum(x): Calculates the sum of all the values in x.
max(x): Finds the maximum value in x.
min(x): Finds the minimum value in x.
sd(x): Calculates the standard deviation of x.
length(x): Returns the number of elements in x.
str(x): Provides a compact, human-readable description of x.
help(x) or ?function_name: To access help files documents of functions
For each of these, x would be the argument, and it is typically a vector of some kind.
User-Defined Functions
In addition to the built-in functions, R allows you to define your own functions. This is particularly useful for tasks you need to perform frequently. User-defined functions are created with the function() command. Here’s an example:
1# Define a function that calculates the average of squares of two numbersavg_of_squares <-function(a, b) {return((a^2+ b^2)/2)}
1
To make your code more readable, use intuitive variable names and include comments (using the “#” symbol) to remind yourself why you wrote a particular line of code.
In this function, a and b are the arguments. This function calculates the squares of a and b, sums them, and then divides by 2 to find the average. It then returns this result.
Special Functions
There are some special types of functions in R, such as:
Anonymous functions: These are functions that are defined without a name. They are used when a function is only needed once, often in the context of apply-type functions. For example, sapply(1:5, function(x) x^2) applies the anonymous function function(x) x^2 to the vector 1:5.
Primitive functions: These are basic functions in R that are implemented in C for efficiency. Examples include basic arithmetic operations (+, -, *, /) and others like sum(), prod(), and mean().
Data Types
Data types determine the kind of operations that can be performed on data, and R offers several basic data types7.
Note
The code data("dataset_name") and data(dataset_name) do the same thing. The code will work regardless of whether the quotes are present. It is a bit faster to leave out the quotes (as we do in the Code at the bottom of this page), so that is usually what we recommend, but it is your choice.
Numeric
The numeric (or double) data type is used for real numbers (numbers with decimal points). Numeric is the default computational data type in R. If you assign a number to a variable without explicitly declaring its type, R will interpret it as a numeric. For example, x <- 7.14 assigns a numeric value of 7.14 to x.
Integer
The integer data type is used for integer numbers (whole numbers without decimal points). In R, you declare an integer by appending an L to the integer value. For example, x <- 7L assigns an integer value of 7 to x.
Logical
The logical data type is used for boolean values: TRUE and FALSE. Logical data types are often the result of logical operations. For example, the operation 5 > 3 returns TRUE, which is a logical value.
Character
The character data type is used for text or string data. To create a character string in R, you enclose the text in either single or double quotes. For example, x <- “Hello, R!” assigns a character string of Hello, R! to x.
Complex
The complex data type is used for complex numbers, which have both real and imaginary parts. In R, complex numbers are represented as x + yi, where x is the real part and y is the imaginary part. For example, x <- 3 + 2i assigns a complex value of 3 + 2i to x.
Raw
The raw data type is used for “raw” bytes. It can hold a stream of raw bytes, which are displayed as hexadecimal. This is not commonly used unless you are doing something fairly advanced, like writing a package to interface to other software or reading binary data directly from a connection.
Knowledge Extension
# Loading the dslabs package and the murders datasetlibrary(dslabs) data(murders) # Determining that the murders dataset is of the "data frame" classclass(murders) # Finding out more about the structure of the objectstr(murders) # Showing the first 6 lines of the datasethead(murders) # Using the accessor operator $ to obtain the population columnmurders$population # Displaying the variable names in the murders datasetnames(murders) # Determining how many entries are in a vectorpop <- murders$population length(pop) # Vectors can be of class numeric and characterclass(pop) class(murders$state) # Logical vectors are either TRUE or FALSEz <-3==2z class(z) # Factors are another type of classclass(murders$region) # Obtaining the levels of a factorlevels(murders$region)
Conclusion
In summary, R is an incredibly robust and flexible tool for a myriad of statistical and data analysis tasks. This guide should provide you with a solid foundation to start exploring and utilizing its power. Happy coding!