Introduction to Programming
Last updated on 2024-10-28 | Edit this page
Overview
Questions
- What is programming?
- What is object oriented programming?
- How do I document code?
- What is a directory?
Objectives
- Learn basic concepts of programming
What is programming?
Programmers use programming languages to give instructions to their computers. In this course, we will learn how to use the open source language R to complete common tasks required in the field of official statistics. This includes the basics of R, data manipulation, and best practices.
There are a few reasons why programming with R is useful for official statistics. Data manipulation and analysis with R is:
Time-saving: R can complete many computations on a large amount of data that would take a person a long time manually
Reproducible: This code can be re-run with other data with small modifications and shared with others to be applied to other new purposes
Transparent: When you’ve completed a script using best practices, you should be left with a clear list of instructions to complete the data analysis in the form of code. This avoids “black boxes” where an analyst is unsure what they’ve done to the data to get it to it’s final form
R is an object oriented programming language
Object oriented programming languages use objects as their
main tools. These objects have classes, which describe their
general properties. For example, in R you might work with
numeric objects, which would contain numbers. You could also
work with characters, which would be composed of text. We’ll
explore classes and data types thoroughly in Episode 3 (Data Types and
Structures). We can assign “labels” to these objects, creating a
variable and use them interchangeably. We assign objects with
an assignment operator. In R, the most commonly used assignment operator
is <-
. Try reproducing the example below on your machine
by entering the code into the RStudio “Console” and hitting “Enter”.
R
# Assign a number to a variable
number_flowers <- 8
# Print the variable's contents
print(number_flowers)
We can get the value stored within the variable by printing it.
OUTPUT
[1] 8
Assigning a new value to a variable breaks the connection with the old value; R forgets that number and applies the variable name to the new value.
When you assign a value to a variable, R only stores the value, not the calculation you used to create it. This is an important point if you’re used to the way a spreadsheet program automatically updates linked cells. Let’s look at an example.
R
# Reassign the variable
number_flowers <- 7
# Print the variable's contents
print(number_flowers)
OUTPUT
[1] 7
Variable Naming Conventions
Historically, R programmers have used a variety of conventions for
naming variables. The .
character in R can be a valid part
of a variable name; thus the above assignment could have easily been
weight.kg <- 57.5
. This is often confusing to R
newcomers who have programmed in languages where .
has a
more significant meaning. Today, most R programmers 1) start variable
names with lower case letters, 2) separate words in variable names with
underscores, and 3) use only lowercase letters, underscores, and numbers
in variable names. This is known as “snake case”. The Tidyverse
Style Guide includes a section on this and
other style considerations.
Documenting Code
Notice that in the above examples, hashtags (#
) are used
before giving instructions that are intended for you rather than R.
Hashtags produce comments, which are handy for leaving
information about the code that will follow. Commenting as much code as
possible is part of best practices. Always comment your code! You owe it
to your colleagues who may see your code (not to mention your future
coding self).
R
# Hashtags go before commented code, which is not run
# print("This code will not be run")
print("Always comment your code!")
OUTPUT
[1] "Always comment your code!"
Directories
A directory is a location on your machine. Say you’d like to open a file that’s located in a folder on your computer. We need to tell R where to look for the file if we expect to find it. Directories are usually listed by referencing nested folders separated by slashes. There are small differences due to operating system (OS), so refer to documentation specific to your OS when learning to work with folder structures.
For example: /Users/Documents/Learning-R
points to a
folder called “Learning-R” in a user’s documents folder. Depending on
your IDE (Integrated Development Environment) and setup, you can print
your current directory, known as the working directory. R
automatically reads and writes files from and to your current working
directory.
R
# Print current working directory
getwd()
OUTPUT
[1] "/Users/Documents/
Before beginning our lessons, please set your working directory to
the folder that we created in the setup section with
setwd()
. For example, if your folder is named
Learning-R
:
R
# Change current working directory
setwd("~/Documents/Learning-R")
Key Points
- Programming makes our work faster, more reproducible, and more transparent
- R is an object oriented programming language
- Document your code with comments
- A working directory is the active location on your computer where R can read and write files