List and Dictionary Methods

Last updated on 2024-07-11 | Edit this page

Overview

Questions

  • How can I store many values together?
  • How can I create a list succinctly?
  • How can I efficiently access nested data?

Objectives

  • Identify and create lists and dictionaries
  • Understand the properties and behaviours of lists and dictionaries
  • Access values in lists and dictionaries
  • Create and access values from nest lists and dictionaries

Values can also be stored in other Python data types such as lists, dictionaries, sets and tuples. Storing objects in a list is a fast and versatile way to apply transformations across a sequence of values. Storing objects in dictionary as key-value pairs is useful for extracting specific values i.e. performing lookup operations.

Create and access lists


Lists have the following properties and behaviours:

  • A single list can store different primitive object types and even other lists
  • Lists are ordered and have a 0-based index
  • Lists can be appended to using the methods append() or insert()
  • Values inside a list can be removed using the methods remove() or pop()
  • Two lists can be concatenated with the operator +
  • Values inside a list can be conditionally iterated through
  • A list is mutable i.e. the values inside a list can be modified in place

To create a list, values are contained within square brackets i.e. [] and individually separated by commas. The function list() can also be used to create a list of values from an iterable object like a string, set or tuple.

PYTHON

# Create a list of integers using []
list_1 = [1, 3, 5, 7]
print(list_1)

OUTPUT

[1, 3, 5, 7]

PYTHON

# Unlike atomic vectors in R, a list can contain multiple primitive object types
list_2 = [1, "one", 1.0, True]
print(list_2)

OUTPUT

[1, 'one', 1.0, True]

PYTHON

# You can also use list() on an iterable object to convert it into a list
string = 'abcdefg'  
list_3 = list(string)  
print(list_3)

OUTPUT

['a', 'b', 'c', 'd', 'e', 'f', 'g']

Because lists have a 0-based index, we can access individual values by their list index position. For 0-based indexes, the first value always starts at position 0 i.e. the first element has an index of 0. Accessing multiple values by their index positions is also referred to as slicing or subsetting a list.

Note that we can use negative numbers as indices in Python. When we do so, the index -1 gives us the last element in the list, -2 gives us the second to last element in the list, and so on.

PYTHON

# Extract individual values from list_3
print('first value:', list_3[0])
print('second value:', list_3[1])
print('last value:', list_3[-1])

OUTPUT

first value: a
second value: b
last value: g

PYTHON

# A syntax quirk for slicing values is to +1 to the last value's index 
# To extract from index 0 to 2, we need to slice from [0:2+1] or [0:3]

# Extract the first three values from list_3
print('first 3 values:', list_3[0:3])

# Start from index 0 and extract values from each subsequent second position
print('every second value:', list_3[0::2])

# Start from index 1, end at index 3 and extract from each subsequent second position
print('every second value from index 1 to 3:', list_3[1:4:2])

OUTPUT

first 3 values: ['a', 'b', 'c']
every second value: ['a', 'c', 'e', 'g']
every second value from index 1 to 3: ['b', 'd']

Change list values


Data which can be modified in place is called mutable, while data which cannot be modified is called immutable. Strings and numbers are immutable in that when we want to change the value of a string or number variable, we can only replace the old value with a completely new value.

PYTHON

string = 'abcde'
string[0] = 'b' # Produces a type error as strings are immutable

# TypeError: 'str' object does not support item assignment

In contrast, lists are mutable and we can modify them after they have been created. We can change individual values, append new values, or reorder the whole list through sorting.

PYTHON

list_4 = ['apple', 'pear', 'plum']
print('original list_4:', list_4)

# Change the first value i.e. modify the list in place
list_4[0] = 'banana'
print('modified list_4:', list_4)

# Add new value to list using the method .insert(index number, value)
list_4.insert(1, 'apple') # Index 1 refers to the second position
print('appended list_4:', list_4)

OUTPUT

original list_4: ['apple', 'pear', 'plum']
modified list_4: ['banana', 'pear', 'plum']
appended list_4: ['banana', 'apple', 'pear', 'plum']

PYTHON

# Sorting a list also modifies it in place
list_5 = [2, 1, 3, 7]
list_5.sort()
print('list_5:', list_5)

OUTPUT

list_5: [1, 2, 3, 7]

However, be careful when modifying data in-place. If two variables refer to the same list, and you modify the list value, it will change for both variables!

PYTHON

# When we assign list_6 to list_5, it means both list_6 and list_5 point to the
# same list object, not that list_6 is a copy of list_5.  

list_6 = list_5  
print('list_5:', list_5)
print('list_6:', list_6)

# Change the first value in list_6 from 1 to 2 
list_6[0] = 2 

print('modified list_6:', list_6)
print('unmodified list_5:', list_5)

# Warning: list_5 and list_6 have both been modified in place!

OUTPUT

list_5: [1, 2, 3, 7]
list_6: [1, 2, 3, 7]
modified list_6: [2, 2, 3, 7]
unmodified list_5: [2, 2, 3, 7]

Because of this behaviour, code which modifies data in place should be handled with care. You can also avoid this behaviour by expliciting creating a copy of the original list and modifying only the object copy. This is why creating a copy of the original data object can be useful in Python.

PYTHON

list_5 = [1, 2, 3, 7]
list_7 = list_5.copy()  
print('list_5:', list_5)
print('list_7:', list_7)

# As list_7 is a completely new object copied from list_5, modifying list_7 does
# not affect list_5.  

list_7[0] = 2 
print('modified list_7:', list_7)
print('unmodified list_5:', list_5)

OUTPUT

list_5: [1, 2, 3, 7]
list_7: [1, 2, 3, 7]
modified list_7: [2, 2, 3, 7]
unmodified list_5: [1, 2, 3, 7]

Useful list functions


There are a lot of functions and methods which can be applied to lists, such as len(), max(), index() and so forth. Mathematical operations do not work on lists of integers, with the exception of +.

Note that + concatenates two lists into a single longer list, rather than outputting the sum of two lists of numbers.

PYTHON

list_8 = [1, 2, 3]
list_9 = [4, 5, 6]

list_8 + list_9 # This concatenates the lists and does not sum the two lists together

OUTPUT

[1, 2, 3, 4, 5, 6]

In your spare time after this workshop, you can search for different list functions and methods and test them out yourselves.

Nested lists


We have previously mentioned that lists can be used to store other Python object types, including lists. This means that we can create nested lists in Python i.e. lists containing lists containing values. This property is useful when we have a collection of values that we want to access or transform as a subgroup.

To create a nested list, we also use [] or list() to contain one or more lists of values of interest.

PYTHON

veg_stock = [
    ['lettuce', 'lettuce', 'tomato', 'zucchini'],
    ['lettuce', 'lettuce', 'carrot', 'zucchini'],
    ['lettuce', 'basil', 'tomato', 'zucchini']
    ]

# Check that veg_stock is a list object
print(type(veg_stock))

# Check that the first value in veg_stock is itself a list
print(veg_stock[0], 'has type', type(veg_stock[0]))  

OUTPUT

<class 'list'>
['lettuce', 'lettuce', 'tomato', 'zucchini'] has type <class 'list'>

To extract the first sub-list within the veg_stock list object, we refer to its index like we would with any other value inside a list i.e. veg_stock[1] points to the second sub-list within the veg_stock list.

To access an individual string value inside a sub-list, we make use of a second index, which points to an individual value inside the sub-list.

PYTHON

print(veg_stock[0]) # Access the first sub-list 
print(veg_stock[0][0]) # Access the first value in the first sub-list 

print(type(veg_stock[0])) # The first value in veg_stock is a list
print(type(veg_stock[0][0])) # The first value in the first list in veg_stock is a string

OUTPUT

['lettuce', 'lettuce', 'tomato', 'zucchini']
lettuce
<class 'list'>
<class 'str'>

In general, however, when we are analysing a large collection of values, the best practice is to structure those values in columns and rows as a tabular Pandas data frame object. This is covered in another Carpentries Course called Python for Social Sciences.

Lists are still incredibly versatile and useful when you have a collection of values that need to be efficiently accessed or transformed. For example, data frame column names are commonly extracted and stored inside a list, so that the same transformation can then be mapped across multiple columns.

Create and access dictionaries


A dictionary is a Python data type that is particularly suited for enabling quick lookup operations on unstructured data sets.

A dictionary can therefore be thought of as an unordered list where every item or value is associated with a unique key (i.e. a self-defined index of unique strings or numbers). The index values are called keys and a dictionary contains key-value pairs with the format {key: value(s)}.

Dictionaries can be created by listing individual key-values pairs inside {} or using dict().

PYTHON

# A key-value pair can contain single or multiple values  
# Keys are treated as case sensitive and unique
# Multiple values are first stored inside a list  

teams = {
    'data science': ['Mei Ling', 'Paul', 'Gwen', 'Suresh'],
    'user design': ['Amy', 'Linh', 'Sasha'],
    'software dev': ['David', 'Prya'],
    'comms': 'Taylor' 
    } 

When using dict(), we need to indicate which key is associated with which value. This can be done directly using tuples, direct association i.e. using = or using zip(), which creates a set of tuples from an iterable list.

PYTHON

# To use dict(), key-value pairs are can be stored inside tuples  
ds_emp_status = dict([
        ('Mei Ling', 'full time'),
        ('Paul', 'full time'),
        ('Gwen', 'part time'),
        ('Suresh', 'part time')
    ])  

# Key-value pairs can also be assigned by direct association  
# Keys cannot be strings i.e. wrapped in '' using this approach
ud_emp_status = dict(
    Amy = 'full time',
    Linh = 'full time',
    Sasha = 'casual' 
    ) 

# zip() can also be used if each key has only one value  
sd_emp_status = dict(zip(
    ['David', 'Prya'],
    ['full time', 'full time']
    ))

To access a specific value inside a dictionary, we need to specify its key using []. This is similar to slicing or subsetting a list by specifying its index using [].

PYTHON

# Access the values associated with the key 'data science'
print(teams['data science'])

print('The object teams is of type', type(teams))
print('The dict value', teams['data science'], 'is of type', type(teams['data science']))

OUTPUT

['Mei Ling', 'Paul', 'Gwen', 'Suresh']
The data object teams is of type <class 'dict'>
The value ['Mei Ling', 'Paul', 'Gwen', 'Suresh'] is of type <class 'list'>

We can also access a value from a dictionary using the get() method.

PYTHON

print(teams.get('user design'))

# get() also enables us to return an alternate string when the key is not found   
# This prevents our code from returning an error message that halts the analysis

print(teams.get('data engineering', 'WARNING: key does not exist'))

OUTPUT

['Amy', 'Linh', 'Sasha']
WARNING: key does not exist

To access data inside a dictionary, we can also perform the following other actions:

  • Check whether a key exists in a dictionary using the keyword in
  • Retrieve unique dictionary keys using dict.keys()
  • Retrieve dictionary values using dict.values()
  • Retrieve dictionary items using dict.items()

PYTHON

# Check whether a key exists in a dictionary 
print('data science' in teams) 
print('Data Science' in teams) # Keys are case sensitive  

# Retrieve all dictionary keys  
print(teams.keys())
print(sd_emp_status.keys())

# Retrieve all dictionary values  
print(sd_emp_status.values())  

# Retrieve all dictionary key-value pairs
print(sd_emp_status.items())

OUTPUT

True
False
dict_keys(['data science', 'user design', 'software dev', 'comms'])
dict_keys(['David', 'Prya'])
dict_values(['full time', 'full time'])
dict_items([('David', 'full time'), ('Prya', 'full time')])

To add a new key-value pair to an existing dictionary, we can create a new key and directly attach a new value to it using = or alternatively use the method update().

PYTHON

print('original dict items:', sd_emp_status.items())  

# Add new key-value pair using direct assignment  
sd_emp_status['Mohammad'] = 'full time'

# Add new key-value pair using update({'key': 'value'})   
sd_emp_status.update({'Carrie': 'part time'})

print('updated dict items:', sd_emp_status.items())    

OUTPUT

original dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])
updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
('Mohammad', 'full time'), ('Carrie', 'part time')])

Because keys are unique, a dictionary cannot contain two keys with the same name. This means that adding an item using a key that is already present in the dictionary will cause the previous value to be overwritten.

PYTHON

print('original dict items:', sd_emp_status.items())  

# As the key 'Carrie' already exists, its value will be overwritten
sd_emp_status['Carrie'] = 'full time'
print('updated dict items:', sd_emp_status.items())  

OUTPUT

original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
('Mohammad', 'full time'), ('Carrie', 'part time')])
updated dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
('Mohammad', 'full time'), ('Carrie', 'full time')])

To remove a key-value pair for an existing dictionary, we can use the del keyword or the method pop(). Using pop() also enables us to return an alternate string if we trt to remove a non-existing key, which prevents our code from returning an error message that halts the analysis.

PYTHON

print('original dict items:', sd_emp_status.items())

# Delete dictionary keys using del and pop()
del sd_emp_status['Mohammad']
sd_emp_status.pop('Carrie')
sd_emp_status.pop('Anuradha', 'WARNING: key does not exist') # Does not generate an error

print('modified dict items:', sd_emp_status.items())  

OUTPUT

original dict items: dict_items([('David', 'full time'), ('Prya', 'full time'),
('Mohammad', 'full time'), ('Carrie', 'full time')])
modified dict items: dict_items([('David', 'full time'), ('Prya', 'full time')])

Nested dictionaries


Similar to lists, dictionaries can be nested as we can also store dictionaries as values inside a key-value pair using {}. Nested dictionaries are useful when we need to store unstructured data in a complex structure. For example, JSON data is commonly used for transmitting data in web applications and often exists in a nested structure that can be stored using nested dictionaries in Python.

PYTHON

# Individual dictionaries are enclosed in {} and separated by a comma
nested_dict = {
    'dict_1': { # First key is a dictionary of key-value pairs 
        'key_1a': 'value_1a',
        'key_1b': 'value_1b'
                },
    'dict_2': { # Second key is another dictionary of key-value pairs
        'key_2a': 'value_2a',
        'key_2b': 'value_2b'
                }
            }

print(nested_dict)

OUTPUT

{'dict_1': {'key_1a': 'value_1a', 'key_1b': 'value_1b'},
 'dict_2': {'key_2a': 'value_2a', 'key_2b': 'value_2b'}}

Similar to working with nested lists, to extract a value from the first sub-dictionary, we specify both the main dictionary and sub-dictionary keys using [].

PYTHON

# Extract the value for key 2a in dict_2
print('original value:', nested_dict['dict_2']['key_2a'])

# Adding or updating a value can be done through the same approach
nested_dict['dict_2']['key_2a'] = "modified_value_2a"  

print('modified value:', nested_dict['dict_2']['key_2a'])

OUTPUT

original value: value_2a
modified value: modified_value_2a

Optional: converting lists and dictionaries to Pandas data frames


Lists and dictionaries can be easily converted into a tabular Pandas data frame format. This can be useful when you need to create a small data set for unit testing purposes.

PYTHON

# Import pandas library
import pandas as pd

# Create a dictionary with each key-value pair representing a data frame column
data = {
    'col_1': [3, 2, 1, 0],
    'col_2': ['a', 'b', 'c', 'd']
    }

df = pd.DataFrame.from_dict(data) 

print(df) # Outputs data as a tabular Pandas data frame   
print(type(df))

OUTPUT

   col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d
<class 'pandas.core.frame.DataFrame'>

Key Points

  • Lists can contain any Python object including other lists
  • Lists are ordered i.e. indexed and can therefore be sliced by index number
  • Unlike strings and integers, the values inside a list can be modified in place
  • A list which contains other lists is referred to as a nested list
  • Dictionaries behave like unordered lists and are defined using key-value pairs
  • Dictionary keys are unique
  • A dictionary which contains other dictionaries is referred to as a nested dictionary
  • Values inside nested lists and dictionaries can be accessed by an additional index