The aim of this Python challenge is to investigate how graphs can be used in Computer Science and investigate the key algorithms used when manipulating graphs in Python such as an algorithm to find the shortest path between two nodes of a graph.
For this challenge we will focus on a graph used to represent the London Underground Map (Zone 1).
Most programming languages do provide direct support for graphs as a data type. Python for instance does not have a graph data structure. However, graphs can be built out of lists and dictionaries. For instance, here is a graph to represent some of the connections between a selection of London tube stations:
tubemap = {"Aldgate" : ["Liverpool Street","Tower Hill"], "Aldgate East" : ["Tower Hill","Liverpool Street"], "Angel" : ["King's Cross St. Pancras","Old Street"], "Baker Street" : ["Marylebone","Regent's Park","Edgware Road (C)","Great Portland Street","Bond Street"], "Bank" : ["Liverpool Street","St. Paul's","London Bridge","Moorgate","Waterloo"], "Barbican" : ["Farringdon","Moorgate"] #... }
This graph is a dictionary where the keys are the nodes of the graph. For each key, the corresponding value is a list containing the nodes that are connected by a direct arc from this node.
The full dictionary with all the stations for London Underground can be found in the Python trinkets below.
Find Path Algorithm
Our first algorithm is used to determine if there is a path between two nodes of a graph, and if so it returns one path that can be used to connect both nodes. A path is a list of nodes which are connected to each others.
This algorithm uses a backtracking approach to explore different paths. It does stop as soon as a path has been found. Note that the path that will be found first may not be the shortest path though.
def find_path(graph, start, end, path=[]): path = path + [start] if start == end: return path if not start in graph: return None for node in graph[start]: if node not in path: newpath = find_path(graph, node, end, path) if newpath: return newpath return None
Find All Paths Algorithm
This second algorithm is very similar to the find_path() algorithm. The difference is that this time the backtracking algorithm will explore and return all the possible paths between the two nodes as a list of paths. (List of lists).
def find_all_paths(graph, start, end, path=[]): path = path + [start] if start == end: return [path] if not start in graph: return [] paths = [] for node in graph[start]: if node not in path: newpaths = find_all_paths(graph, node, end, path) for newpath in newpaths: paths.append(newpath) return paths
Find Shortest Path Algorithm
Our final algorithm is probably the most useful algorithm as it will identify amongst all the paths that have been found, which one is the shortest path (The path that has the smallest number of nodes).
To make this algorithm more effective we also have a stopping condition to stop exploring a path that would be longer than the shortest path found so far.
def find_shortest_path(graph, start, end, shortestLength=-1, path=[]): path = path + [start] if start == end: return path if not start in graph: return None shortest = None for node in graph[start]: if node not in path: if shortestLength==-1 or len(path)<(shortestLength-1): newpath = find_shortest_path(graph, node, end, shortestLength, path) if newpath: if not shortest or len(newpath) < len(shortest): shortest = newpath shortestLength = len(newpath) return shortest
Python Code
We will use the find_shortest_path() algorithm to find the shortest path between two tube stations.
Extension
A backtracking algorithm can sometimes take a very long time to explore all possible paths. This is due to the complexity of the graph (e.g. tube map) which contains many nodes and connections between nodes, hence the high number of potential routes to explore using a backtracking algorithm.
More advanced algorithms can be implemented to overcome this issue. You may want to explore the following algorithms: