The famous Travelling Salesman Problem (TSP) is about finding an optimal route between a collection of nodes (cities) and returning to where you started. It sounds simple, but is impossible to solve by brute force for large numbers of nodes, since the number of possible orderings of `n`

cities is `n!`

. This means that for even just 30 cities, the number of trips you would have to check is 265,252,859,812,191,058,636,308,480,000,000. Large TSP problems are impractical to solve by brute force, even by powerful computers.

Fortunately, some algorithms have been developed that dramatically reduce the amount of compute needed to solve large TSPs. One such piece of software, Concorde, was developed a couple of decades ago for use in the academic community. Although it is quite technical to use as stand-alone software, and is intended for specialists only, *pyconcorde* has been developed as a Python wrapper for Concorde. An explanation of the algorithms used in Concorde is outside the scope of this article. However, we will delve into the code needed to reproduce these problems and their solutions in Python.

How would someone go about solving a real-world, geographical travelling salesman problem? Real-world points are not connected by simple 2D lines like in the image above. Instead, geographical features are connected by various possible routes, and those routes will change depending on whether someone is walking, cycling or driving.

Why might a data scientist or software engineer want to solve a real-world TSP? Here are a few examples of use cases:

- A company employing couriers needs a way of calculating optimal routes through a city, minimizing the time spent on the road for each of its drivers.
- A tour operator needs to find the shortest route connecting a set of destinations, within a constrained amount of time.
- A waste disposal company or local authority needs to allocate its resources to ensure pickups are ordered in as efficient a manner as possible.

In order to solve a real-world TSP, the *routingpy** *library can be used to find routes, distances (in metres) and durations (in seconds) between geographical points in `[longitude, latitude]`

pairs. In this article we will describe the method that can be used for such a problem.

A guide to solving a geographic TSP using Python is outlined here. The broad structure of the problem-solving process is as follows:

- Obtain a list of
*n*coordinates as`[longitude, latitude]`

pairs. - Use a routing service to obtain a matrix (
*n*x*n*) of real-world durations between each of these coordinates, for the appropriate profile (walking, cycling, driving a car, driving an HGV, etc). This matrix will be asymmetric (driving from A to B is not the exact reverse of B to A). - Transform (
*n*x*n*) matrix into a symmetric matrix (*2n*x*2n*). - Feed this matrix into the Concorde solver to find an optimal ordering of coordinates.
- Create the real-world route using the routing service.
- Visualize the results on a map.
- Optionally, create a GPX file of the final route.

Each of these steps will be covered in detail.

## Step 1: Obtaining coordinates

For our example, we will consider the problem of driving a car between 79 cities in the UK. Shown below is a map of the UK cities in blue. A data scientist can find coordinates in a number of ways. If required, they can be found manually using Google Maps or Google Earth.

**The code structure and data used in this example is also available in ****this GitHub repository****.**

Here is a CSV file containing the coordinates of the cities (*gb_cities.csv* in the repo), and below it the code required to import it using pandas.

`Place Name,Latitude,Longitude`

Aberdeen,57.149651,-2.099075

Ayr,55.458565,-4.629179

Basildon,51.572376,0.470009

Bath,51.380001,-2.36

Bedford,52.136436,-0.460739

...

`import pandas as pd`

df = pd.read_csv('gb_cities.csv')

coordinates = df[['Longitude', 'Latitude']].values

names = df['Place Name'].values

## Step 2: Using a routing service to obtain duration matrix

There are several routing services available through the routingpy library. The API from Graphhopper includes a free tier which allows rate-limited use. Other routers that are available through routingpy are listed in the documentation.

`import routingpy as rp`

import numpy as npapi_key = # get a free key at https://www.graphhopper.com/

api = rp.Graphhopper(api_key=api_key)

matrix = api.matrix(locations=coordinates, profile='car')

durations = np.matrix(matrix.durations)

print(durations)

Here is `durations`

, a 79 x 79 matrix of the driving time in seconds between coordinates:

`matrix([[ 0, 10902, 30375, ..., 23380, 25233, 19845],`

[10901, 0, 23625, ..., 16458, 18312, 13095],

[30329, 23543, 0, ..., 8835, 9441, 12260],

...,

[23397, 16446, 9007, ..., 0, 2789, 7924],

[25275, 18324, 9654, ..., 2857, 0, 9625],

[19857, 13071, 12340, ..., 8002, 9632, 0]])

The driving time between cities can be determined as follows:

- Each row and column corresponds to a city: Aberdeen is the first row and column, Ayr the second, Basildon the third, and so on.
- To find the time between Aberdeen and Ayr, look at the 1st row, 2nd column: 10,902 seconds. The reverse time (Ayr to Aberdeen) is 10,901 seconds.
- In general, the time from the i-th to the j-th city is at the intersection between the i-th row and j-th column.

Notice that the matrix, as expected, has zeros along the diagonal, since each point is connected to itself with zero distance or duration. Also, the matrix is not quite symmetric: driving durations between cities are unlikely to be identical in opposite directions, due to different road layouts and traffic hotspots. They are broadly similar, though, as would be expected.

## Step 3: Transforming asymmetric matrix to symmetric

Before using this matrix to generate an optimal ordering in pyconcorde, we need to make the matrix symmetric. A method for transforming an asymmetric TSP into symmetric TSP is described by Jonker and Volgenant (1983): Transforming asymmetric into symmetric traveling salesman problems, Operations Research Letters, 2(4), 161–163. What follows is the theory behind this transformation. If desired, this section can be skipped (scroll down to the section titled *Transforming the geographic asymmetric TSP*)**.**

**Jonker/Volgenant asymmetric to symmetric transformation**

Below is a visualization of an asymmetric TSP with 3 nodes, and its distance matrix.

`matrix([[0, 5, 2],`

[7, 0, 4],

[3, 4, 0]])

Here is a sketch of the method used to transform this into a symmetric TSP:

- Create new
*ghost nodes*, A’, B’ and C’. Join A to A’, B to B’ and C to C’ with distance zero. - Connect the nodes with weights as follows:

A to B is now represented by A’ to B; B to A is now B’ to A.

B to C is now B’ to C; C to B is now C’ to B.

C to A is now C’ to A; A to C is A’ to C. - Set all other edge weights to be infinite, so any algorithm does not attempt to travel between them. Since this will be impractical later when using pyconcorde, instead set all other weights to be much higher than the highest weight we have. In this case, we will set them to equal 99.

Here is the resulting distance matrix. The ordering of the nodes in the matrix is: A, B, C, A’, B’, C’.

`matrix([[ 0, 99, 99, 0, 7, 3],`

[99, 0, 99, 5, 0, 4],

[99, 99, 0, 2, 4, 0],

[ 0, 5, 2, 0, 99, 99],

[ 7, 0, 4, 99, 0, 99],

[ 3, 4, 0, 99, 99, 0]])

Note again that the diagonal is zero, as would be expected, and that the matrix is now symmetric. The original matrix is in the bottom-left corner of the new matrix, and its transpose is in the top-right. Meanwhile, the top-left and bottom-right parts contain very high weights between nodes.

A, B and C (top-left) are no longer connected to each other (strictly speaking, they are connected but with very high instead of infinite weight, for practical purposes). This means that any algorithm will not seek to find a path between these nodes. Likewise, A’, B’ and C’ (bottom-right) are not connected to each other. Instead, the directional nature of the original asymmetric network is represented here by the weights on the original nodes A, B and C, together with their ghosts A’, B’ and C’.

There is a one-to-one mapping between solutions of the original asymmetric problem and the new, symmetric TSP:

- A — B — C — A corresponds to A — A’ — B — B’ — C — C’ — A
- A — C — B — A corresponds to A — A’ — C — C’ — B — B’ — A

In each case the ghost nodes A’, B’ and C’ alternate with the original nodes A, B and C, and each original node is adjacent to its ‘partner’ ghost node (A is adjacent to A’, and so on).

**Transforming the geographic asymmetric TSP**

Back to our practical example. We can create a function to transform an asymmetric TSP matrix into a symmetric one:

`def symmetricize(m, high_int=None):`# if high_int not provided, make it equal to 10 times the max value:

if high_int is None:

high_int = round(10*m.max())

m_bar = m.copy()

np.fill_diagonal(m_bar, 0)

u = np.matrix(np.ones(m.shape) * high_int)

np.fill_diagonal(u, 0)

m_symm_top = np.concatenate((u, np.transpose(m_bar)), axis=1)

m_symm_bottom = np.concatenate((m_bar, u), axis=1)

m_symm = np.concatenate((m_symm_top, m_symm_bottom), axis=0)

return m_symm.astype(int) # Concorde requires integer weights

`symmetricize(durations)`

returns:

`matrix([[ 0, 461120, 461120, ..., 23397, 25275, 19857],`

[461120, 0, 461120, ..., 16446, 18324, 13071],

[461120, 461120, 0, ..., 9007, 9654, 12340],

...,

[ 23397, 16446, 9007, ..., 0, 461120, 461120],

[ 25275, 18324, 9654, ..., 461120, 0, 461120],

[ 19857, 13071, 12340, ..., 461120, 461120, 0]])

This 158 x 158 matrix contains a copy of `durations`

in the bottom left and a transposed copy in the top right. The high value of 461,120 (10 times the maximum value in `durations`

) means that, for practical purposes, nodes with this duration are not connected.

This matrix can finally be fed into pyconcorde to calculate an optimal path.

## Step 4: Using the Concorde solver

**Installing pyconcorde**

Run the following commands to install pyconcorde (installation is available in Linux or Mac OS, but not in Windows at present):

`virtualenv venv # create virtual environment`

source venv/bin/activate # activate it

git clone https://github.com/jvkersch/pyconcorde # clone git repo

cd pyconcorde # change directory

pip install -e . # install pyconcorde

**Solving the TSP in Python**

Now we can import from `concorde`

in a Python script.

`from concorde.problem import Problem`

from concorde.concorde import Concordedef solve_concorde(matrix):

problem = Problem.from_matrix(matrix)

solver = Concorde()

solution = solver.solve(problem)

print(f'Optimal tour: {solution.tour}')

return solution

Our symmetric durations matrix can be fed into `solve_concorde()`

.

`durations_symm = symmetricize(durations)`

solution = solve_concorde(durations_symm)

Here is the print output:

`Optimal tour: [0, 79, 22, 101, 25, 104, 48, 127, 68, 147, 23, 102, 58, 137, 7, 86, 39, 118, 73, 152, 78, 157, 36, 115, 42, 121, 62, 141, 16, 95, 20, 99, 51, 130, 40, 119, 19, 98, 59, 138, 50, 129, 54, 133, 27, 106, 10, 89, 4, 83, 66, 145, 33, 112, 14, 93, 2, 81, 45, 124, 32, 111, 11, 90, 29, 108, 34, 113, 24, 103, 8, 87, 17, 96, 56, 135, 64, 143, 61, 140, 75, 154, 52, 131, 71, 150, 18, 97, 3, 82, 9, 88, 74, 153, 55, 134, 72, 151, 28, 107, 12, 91, 70, 149, 65, 144, 35, 114, 31, 110, 77, 156, 63, 142, 41, 120, 69, 148, 6, 85, 76, 155, 67, 146, 15, 94, 44, 123, 47, 126, 60, 139, 57, 136, 38, 117, 13, 92, 5, 84, 43, 122, 49, 128, 46, 125, 21, 100, 1, 80, 30, 109, 53, 132, 37, 116, 26, 105]`

This solution shows the ordering of nodes in the optimal tour. Note that this solution, as expected above, contains original nodes (numbered 0 to 78) alternating with their partner ghost nodes (79 to 157):

- 0 is partnered with 79,
- 22 with 101,
- 25 with 104, and so on…

This suggests that the solution has worked correctly.

## Step 5: Creating the real-world route

The next step is to pick alternate elements of the solution (the nodes corresponding to the original 79 cities), then order the coordinates accordingly.

`# pick alternate elements: these correspond to the originals`

tour = solution.tour[::2]# order the original coordinates and names

coords_ordered = [coordinates[i].tolist() for i in tour]

names_ordered = [names[i] for i in tour]

Here are the first few city names in `names_ordered`

, (the real ordering of the cities in the optimal tour):

`['Aberdeen',`

'Dundee',

'Edinburgh',

'Newcastle Upon Tyne',

'Sunderland',

'Durham',

...]

Now we add back in the first city to make a complete looped tour, and finally obtain the final route using the Graphhopper directions API.

`# add back in the first for a complete loop`

coords_ordered_return = coords_ordered + [coords_ordered[0]]# obtain complete driving directions for the ordered loop

directions = api.directions(locations=coords_ordered_return, profile='car')

## Step 6: Visualization on a map

Seeing the final route on a map will enable us to be confident in the result, as well as allowing us to use the solution in a practical setting. The following code will display a folium map which can be saved to HTML.

`import folium`

def generate_map(coordinates, names, directions):# folium needs lat, long

coordinates = [(y, x) for (x, y) in coordinates]

route_points = [(y, x) for (x, y) in directions.geometry]

lat_centre = np.mean([x for (x, y) in coordinates])

lon_centre = np.mean([y for (x, y) in coordinates])

centre = lat_centre, lon_centre

m = folium.Map(location=centre, zoom_start=1, zoom_control=False)

# plot the route line

folium.PolyLine(route_points, color='red', weight=2).add_to(m)

# plot each point with a hover tooltip

for i, (point, name) in enumerate(zip(coordinates, names)):

folium.CircleMarker(location=point,

tooltip=f'{i}: {name}',

radius=2).add_to(m)

custom_tile_layer = folium.TileLayer(

tiles='http://{s}.basemaps.cartocdn.com/light_all/{z}/{x}/{y}.png',

attr='CartoDB Positron',

name='Positron',

overlay=True,

control=True,

opacity=0.7 # Adjust opacity to control the level of greying out

)

custom_tile_layer.add_to(m)

folium.LayerControl().add_to(m)

sw = (np.min([x for (x, y) in coordinates]), np.min([y for (x, y) in coordinates]))

ne = (np.max([x for (x, y) in coordinates]), np.max([y for (x, y) in coordinates]))

m.fit_bounds([sw, ne])

return m

generate_map(coords_ordered, names_ordered, directions).save('gb_cities.html')

The result is shown at the top of this article. Click here to view as an interactive map. It’s possible to zoom in to the map to see more detail and to hover over individual cities which will reveal their number in the tour sequence. Below is a zoomed-in part of the map showing the route passing through Sheffield (between Lincoln and Chesterfield on the optimal tour).

## Step 7: Optional: Creating a GPX file

If the calculated route needs to be followed in real-life, for instance on a device with a GPS (such as a phone or car navigation system), a GPX can be created. This is not part of the optimization problem, but is an optional additional step available if you want to save the route to a file. The GPX file is created from the `directions`

variable:

`def generate_gpx_file(directions, filename):`

gpx_template = """<?xml version="1.0" encoding="UTF-8"?>

<gpx version="1.1" xmlns="http://www.topografix.com/GPX/1/1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://www.topografix.com/GPX/1/1

http://www.topografix.com/GPX/1/1/gpx.xsd">

<trk>

<name>Track</name>

<trkseg>{}</trkseg>

</trk>

</gpx>

"""trkseg_template = """

<trkpt lat="{}" lon="{}"/>

"""

trkseg_elements = ""

for point in directions.geometry:

trkseg_elements += trkseg_template.format(point[1], point[0])

gpx_data = gpx_template.format(trkseg_elements)

with open(filename, 'w') as file:

file.write(gpx_data)

generate_gpx_file(directions, 'gb_cities.gpx')

The GPX file for this problem can be downloaded here.

We have seen how we can combine the following elements to solve real-world geographic travelling salesman problems:

- Directions and duration matrices from the routingpy library, specifying an appropriate
`profile`

(mode of transport). - Efficient and powerful Concorde solver via the pyconcorde wrapper, to provide an optimal route.
- Visualization using folium to create a map.

The driving tour shown above is a convincing solution to the 79-city travelling salesman problem, and according to the Concorde solver is provably ‘optimal’. Since we are working with real-world data, however, the end result is only as good as the input. We are relying that the point-to-point durations matrix obtained from routingpy is representative of the real-world. In reality, the time taken to walk, cycle or drive between points will depend on the time of day, or day of the week. This is a limitation of the method that we’ve used. One way of being more confident in the end result would be to try the same method with an alternative routing service. Each routing service (Graphhopper, ORS, Valhalla, and so on) has its own API which could be used for a TSP problem such as the one described here, and the results could be compared from different services.

Despite the real-world limitations of solving a problem such as this, the methodology above provides a good starting point for a salesperson or courier needing to make their way round a city in as efficient a manner as possible or a tourist hoping to catch as many sights as possible on their tour. By visualizing the results on an interactive map and storing the route as a GPX file, the solution is useful by the end user, not just the data scientist who implemented the code.