Visualizing Travel Times with Multidimensional Scaling

Which map is correct?

In a geography exam, the correct answer would be the left or upper one. It displays the actual locations of four cities in the US. The other map displays other data. Specifically, it approximates the travel times between the four cities. This means that the closer two cities are on the right map the faster you can travel between them with public transport. We can calculate such maps using Multidimensional Scaling. What is Multidimensional Scaling? How can it help us to approximate travel times? And what is the relationship between the left map with the geographic locations and the right map? We are about to find out.

Task

Take a look at the following table.

From \ To	San Francisco	Sacramento	Los Angeles	Las Vegas
San Francisco	-	121 km	559 km	671 km
Sacramento	121 km	-	582 km	622 km
Los Angeles	559 km	582 km	-	368 km
Las Vegas	671 km	622 km	368 km	-

It shows the great-circle distances between the cities in the two maps above. A strong cell color indicates a large distance. Now take a look at the second table below. Each cell displays the time needed to travel by public transportation from one city to another.

From \ To	San Francisco	Sacramento	Los Angeles	Las Vegas
San Francisco	-	1 h 50 min	7 h 18 min	17 h 7 min
Sacramento	2 h 12 min	-	11 h 30 min	16 h 51 min
Los Angeles	7 h 40 min	10 h 0 min	-	6 h 14 min
Las Vegas	14 h 45 min	18 h 28 min	6 h 5 min	-

The similarity of both color maps shows that, generally, the travel time is proportional to the distance between two cities. However, there are small irregularities: the distance between San Francisco and Los Angeles is in the same ballpark as the distance between San Francisco and Las Vegas. Yet, the time needed to travel to Las Vegas is more than twice the time needed to travel to Los Angeles from San Francisco. So, either San Francisco and Los Angeles are connected exceptionally well or San Francisco and Las Vegas exceptionally badly.

Overall, we see that the distance between two cities does not necessarily correspond to their proximity by public transport. So, if we try to visualize how well cities are connected to each other, their actual geolocation may not be representative. A better solution is to take the table of travel times and calculate a list of coordinates so that the distances between those positions resemble the travel times as best as possible. I will describe the required steps in detail below. But first, you can take a look at the result:

Result

I created a small tool to visualize the results of this post. In the upper left corner of the map below, you will find a search box. Select three to ten cities which are reachable by public transportation. In the left or upper map, you will see their actual locations. The right or lower map will display the cities in a way that well-connected cities are closer to each other.

If you want to understand why certain cities are close while others are not, you can take a look at the distance table and the table of travel durations below. Two cities being closer on the right map than on the left map indicates a good ratio between travel time and distance. If the cities' positions on the right map still seem arbitrary, please take a look at the "Solution" section below.

Distances:

Travel durations by public transport (next Monday at 12 pm in the time zone of the first city):

Solution

Given: The cities' geographic locations and a matrix of travel times between their central stations.

Multidimensional Scaling

We start with the matrix of travel times. Given a matrix of distances or dissimilarities between points, it is possible to calculate coordinates for each point that depict the similarities between the points when drawn onto a coordinate system. This is what Multidimensional Scaling (MDS) does. However, the distance between two points in the coordinate system may not always be equal to the value given by the matrix. Imagine a matrix with small entries for A-B and B-C but a very large entry for A-C. How can you draw A, B and C into a coordinate system without breaking the constraints given by the matrix?

Considering that a perfect solution might not exist, we therefore try to find points that approximate the given travel times best. This is similar to comparing the distances table with the travel durations table above, except now we may alter the cities' geographic coordinates to make both tables look as similar as possible. Specifically, we define "similar" via the squared difference between the distance between two points and the travel duration between the corresponding cities. Given a matrix $A$, where $a_{i,j}$ denotes the seconds needed to travel from city $i$ to city $j$, and the tuple $P$ containing $N$ two-dimensional points $\{p_1, ..., p_N\}$, the overall error $e(P)$ is defined as $$e(P) = \frac{1}{N^2}\sum _{i=1}^{N}{~} \sum _{j=1}^{N}{~}(a_{i,j}~-~dist(p_i,~p_j))^2$$ with $dist(p_i, p_j)$ being the Euclidean distance between $p_i$ and $p_j$. For our purpose, it would be better to instead use the great-circle distance between points on a sphere. However, Multidimensional Scaling is not restricted to geographic applications, but can be used to visualize any matrix of dissimilarity values, including dissimilarities between breakfast cereals, legal offences and water samples from wells. Therefore, classical Multidimensional Scaling maps points into an abstract n-dimensional Euclidean space.

Classical Multidimensional Scaling

Update (21 June 2020) : Initially, I used Ben Frederickson's implementation of the classical MDS algorithm. I later discovered the following note in a derivation of this algorithm:

"When the observed proximity matrix is not Euclidean, the matrix B is not positive-definite. In such case, some of the eigenvalues of B will be negative; correspondingly, some coordinate values will be complex numbers. If B has only a small number of small negative eigenvalues, it's still possible to use the eigenvectors associated with the q largest positive eigenvalues"

(for details, please refer to these slides). While the algorithm produces reasonable approximations for every set of cities that I have entered so far, I wondered whether the condition mentioned in the last sentence of the quote was true in every case. After all, public transport can be pretty non-linear. Out of interest, I therefore implemented gradient descent for solving a Multidimensional Scaling problem and compared its solution to the one produced by the classical MDS algorithm.

The classical MDS algorithm assumes that the given distance matrix contains Euclidean distances. Using the Singular Value Decomposition of the centered squared distance matrix, it computes a tuple of points with optimal coordinates, minimizing the sum of squared errors above (the details of this can be found here). However, since trains do not always go the same speed and not every city is directly connected to all others, the algorithm's assumption does not apply in our case. While it still produces reasonable approximations, the results turn out to be suboptimal.

Iterative Optimization

Therefore, I tried out using gradient descent to see if it finds better solutions. There are better ways to perform Multidimensional Scaling (see "Modern Multidimensional Scaling - Theory and Applications" by Borg and Groenen for a comprehensive overview), but I was curious about how well gradient descent would perform.

Since the error function is quite simple, implementing its gradient was straightforward. I also tried out TensorFlow.js for differentiating the error function automatically, but found it to be orders of magnitude slower than computing the gradient directly (with 800 kilobytes, the library is also quite heavy). Nevertheless, it was helpful for ensuring that the gradient computation is correct. Out of interest, I also implemented the Gauss-Newton algorithm for Multidimensional Scaling to see how fast it finds a solution. You can find the code for both iterative optimization algorithms, each one applicable to general MDS problems, at the bottom of this post. Here is how they compare to the classical MDS algorithm on the set of cities that you entered above:

"Gradient Descent" refers to gradient descent with a learning rate of 1, "Momentum" refers to gradient descent with a learning rate of 0.5 and a momentum of 0.5 and "Gauss-Newton" refers to the Gauss-Newton algorithm with a learning rate ($\alpha$) of 0.1. The error of each algorithm's final solution is appended in parentheses. I experimented a bit with the learning rates, but generally you should not take the error curves too seriously. The key takeaway is that they all usually find better solutions than the classical MDS algorithm. Another benefit of the gradient-based approach is that we can easily modify the error function to, for example, use the great-circle distance instead of the Euclidean distance for $dist(p_i, p_j)$.

Back To Earth

The points obtained by Multidimensional Scaling lie in some abstract Euclidean space and will not match the geographic locations of the cities. This is due to two facts:

The travel times are given in seconds on public transport, while the distances between cities are measured in meters. As these are two completely different scales, the points' coordinates first have to be scaled to a geometrically meaningful coordinate system.
Rotating, reflecting or moving the tuple of points does not change the distances between them. So, even if we scale the points to match the cities' scale, they may be upside down or at a completely different place on the earth. But since the tuple's inner distances do not change, it is still the optimal solution for approximating the matrix of travel times.

In short, we have to scale, rotate, reflect and move the tuple of points in order to get meaningful results. Luckily, Shinji Umeyama provides an algorithm that does exactly this in "Least-squares estimation of transformation parameters between two point patterns" (IEEE Transactions on Pattern Analysis & Machine Intelligence, 1991). For two tuples of n-dimensional points, the algorithm computes the rotation, translation and scale parameters to transform the points in one tuple such that the sum of squared Euclidean distances between corresponding points in the two tuples is minimized. I uploaded a JavaScript implementation of Umeyama's algorithm that optionally allows reflections here.

However, we run into the same problem as above: the points are two-dimensional and the algorithm minimizes Euclidean distances, but the cities lie on a sphere and we should therefore minimize great-circle distances. So, now we have three options:

Perform gradient-based optimization directly on geographic coordinates using great-circle distances.
Search for or think of extensions of Multidimensional Scaling and similarity transformations to geographic coordinates.
Hope that public transport near the poles or across the 180° line of longitude will not increase drastically in the near future. Then, we simply treat the two coordinates of each point as latitude and longitude and accept that the similarity transformation algorithm will focus a bit too much on matching coordinates of cities close to the poles. We also accept to be mocked by every geodesist, although I think most of them already stopped reading when we performed MDS in Euclidean space without thinking about map projections.

After a bit of option 2, I chose option 3. The resulting points, based on the MDS solution of vanilla gradient descent, are depicted in the right or lower map above.

Code

I uploaded the code for performing Multidimensional Scaling with gradient descent or the Gauss-Newton algorithm to GitHub, as well as the implementation of Shinji Umeyama's similarity transformation algorithm:

The code in these repositories is documented a bit more, but you can also take a quick look at the implementation below. As mentioned above, I sanity checked the gradient computation using TensorFlow.js.

/**
 * Transform a tuple of source points to match a tuple of target points
 * following equation 40, 41 and 42 of umeyama_1991.
 *
 * This function expects two mlMatrix.Matrix instances of the same shape
 * (m, n), where n is the number of points and m is the number of dimensions.
 * This is the shape used by umeyama_1991. m and n can take any positive value.
 *
 * The returned matrix contains the transformed points.
 *
 * @param {!mlMatrix.Matrix} fromPoints - the source points {x_1, ..., x_n}.
 * @param {!mlMatrix.Matrix} toPoints - the target points {y_1, ..., y_n}.
 * @param {boolean} allowReflection - If true, the source points may be
 *   reflected to achieve a better mean squared error.
 * @returns {mlMatrix.Matrix}
 */
function getSimilarityTransformation(fromPoints,
                                     toPoints,
                                     allowReflection = false) {
    const dimensions = fromPoints.rows;
    const numPoints = fromPoints.columns;

    // 1. Compute the rotation.
    const covarianceMatrix = getSimilarityTransformationCovariance(
        fromPoints,
        toPoints);

    const {
        svd,
        mirrorIdentityForSolution
    } = getSimilarityTransformationSvdWithMirrorIdentities(
        covarianceMatrix,
        allowReflection);

    const rotation = svd.U
        .mmul(mlMatrix.Matrix.diag(mirrorIdentityForSolution))
        .mmul(svd.V.transpose());

    // 2. Compute the scale.
    // The variance will first be a 1-D array and then reduced to a scalar.
    const summator = (sum, elem) => {
        return sum + elem;
    };
    const fromVariance = fromPoints
        .variance('row', {unbiased: false})
        .reduce(summator);

    let trace = 0;
    for (let dimension = 0; dimension < dimensions; dimension++) {
        const mirrorEntry = mirrorIdentityForSolution[dimension];
        trace += svd.diagonal[dimension] * mirrorEntry;
    }
    const scale = trace / fromVariance;

    // 3. Compute the translation.
    const fromMean = mlMatrix.Matrix.columnVector(fromPoints.mean('row'));
    const toMean = mlMatrix.Matrix.columnVector(toPoints.mean('row'));
    const translation = mlMatrix.Matrix.sub(
        toMean,
        mlMatrix.Matrix.mul(rotation.mmul(fromMean), scale));

    // 4. Transform the points.
    const transformedPoints = mlMatrix.Matrix.add(
        mlMatrix.Matrix.mul(rotation.mmul(fromPoints), scale),
        translation.repeat({columns: numPoints}));

    return transformedPoints;
}

/**
 * Compute the mean squared error of a given solution, following equation 1
 * in umeyama_1991.
 *
 * This function expects two mlMatrix.Matrix instances of the same shape
 * (m, n), where n is the number of points and m is the number of dimensions.
 * This is the shape used by umeyama_1991.
 *
 * @param {!mlMatrix.Matrix} transformedPoints - the solution, for example
 *   returned by getSimilarityTransformation(...).
 * @param {!mlMatrix.Matrix} toPoints - the target points {y_1, ..., y_n}.
 * @returns {number}
 */
function getSimilarityTransformationError(transformedPoints, toPoints) {
    const numPoints = transformedPoints.columns;
    const difference = mlMatrix.Matrix.sub(toPoints, transformedPoints);
    return Math.pow(difference.norm('frobenius'), 2) / numPoints;
}

/**
 * Compute the minimum possible mean squared error for a given problem,
 * following equation 33 in umeyama_1991.
 *
 * This function expects two mlMatrix.Matrix instances of the same shape
 * (m, n), where n is the number of points and m is the number of dimensions.
 * This is the shape used by umeyama_1991. m and n can take any positive value.
 *
 * @param {!mlMatrix.Matrix} fromPoints - the source points {x_1, ..., x_n}.
 * @param {!mlMatrix.Matrix} toPoints - the target points {y_1, ..., y_n}.
 * @param {boolean} allowReflection - If true, the source points may be
 *   reflected to achieve a better mean squared error.
 * @returns {number}
 */
function getSimilarityTransformationErrorBound(fromPoints,
                                               toPoints,
                                               allowReflection = false) {
    const dimensions = fromPoints.rows;

    // The variances will first be 1-D arrays and then reduced to a scalar.
    const summator = (sum, elem) => {
        return sum + elem;
    };
    const fromVariance = fromPoints
        .variance('row', {unbiased: false})
        .reduce(summator);
    const toVariance = toPoints
        .variance('row', {unbiased: false})
        .reduce(summator);
    const covarianceMatrix = getSimilarityTransformationCovariance(
        fromPoints,
        toPoints);

    const {
        svd,
        mirrorIdentityForErrorBound
    } = getSimilarityTransformationSvdWithMirrorIdentities(
        covarianceMatrix,
        allowReflection);

    let trace = 0;
    for (let dimension = 0; dimension < dimensions; dimension++) {
        const mirrorEntry = mirrorIdentityForErrorBound[dimension];
        trace += svd.diagonal[dimension] * mirrorEntry;
    }
    return toVariance - Math.pow(trace, 2) / fromVariance;
}

/**
 * Computes the covariance matrix of the source points and the target points
 * following equation 38 in umeyama_1991.
 *
 * This function expects two mlMatrix.Matrix instances of the same shape
 * (m, n), where n is the number of points and m is the number of dimensions.
 * This is the shape used by umeyama_1991. m and n can take any positive value.
 *
 * @param {!mlMatrix.Matrix} fromPoints - the source points {x_1, ..., x_n}.
 * @param {!mlMatrix.Matrix} toPoints - the target points {y_1, ..., y_n}.
 * @returns {mlMatrix.Matrix}
 */
function getSimilarityTransformationCovariance(fromPoints, toPoints) {
    const dimensions = fromPoints.rows;
    const numPoints = fromPoints.columns;
    const fromMean = mlMatrix.Matrix.columnVector(fromPoints.mean('row'));
    const toMean = mlMatrix.Matrix.columnVector(toPoints.mean('row'));

    const covariance = mlMatrix.Matrix.zeros(dimensions, dimensions);

    for (let pointIndex = 0; pointIndex < numPoints; pointIndex++) {
        const fromPoint = fromPoints.getColumnVector(pointIndex);
        const toPoint = toPoints.getColumnVector(pointIndex);
        const outer = mlMatrix.Matrix.sub(toPoint, toMean)
            .mmul(mlMatrix.Matrix.sub(fromPoint, fromMean).transpose());

        covariance.addM(mlMatrix.Matrix.div(outer, numPoints));
    }

    return covariance;
}

/**
 * Computes the SVD of the covariance matrix and returns the mirror identities
 * (called S in umeyama_1991), following equation 39 and 43 in umeyama_1991.
 *
 * See getSimilarityTransformationCovariance(...) for more details on how to
 * compute the covariance matrix.
 *
 * @param {!mlMatrix.Matrix} covarianceMatrix - the matrix returned by
 *   getSimilarityTransformationCovariance(...)
 * @param {boolean} allowReflection - If true, the source points may be
 *   reflected to achieve a better mean squared error.
 * @returns {{
 *   svd: mlMatrix.SVD,
 *   mirrorIdentityForErrorBound: number[],
 *   mirrorIdentityForSolution: number[]
 * }}
 */
function getSimilarityTransformationSvdWithMirrorIdentities(covarianceMatrix,
                                                            allowReflection) {
    // Compute the SVD.
    const dimensions = covarianceMatrix.rows;
    const svd = new mlMatrix.SVD(covarianceMatrix);

    // Compute the mirror identities based on the equations in umeyama_1991.
    let mirrorIdentityForErrorBound = Array(svd.diagonal.length).fill(1);
    let mirrorIdentityForSolution = Array(svd.diagonal.length).fill(1);
    if (!allowReflection) {
        // Compute equation 39 in umeyama_1991.
        if (mlMatrix.determinant(covarianceMatrix) < 0) {
            const lastIndex = mirrorIdentityForErrorBound.length - 1;
            mirrorIdentityForErrorBound[lastIndex] = -1;
        }

        // Check the rank condition mentioned directly after equation 43.
        mirrorIdentityForSolution = mirrorIdentityForErrorBound;
        if (svd.rank === dimensions - 1) {
            // Compute equation 43 in umeyama_1991.
            mirrorIdentityForSolution = Array(svd.diagonal.length).fill(1);
            if (mlMatrix.determinant(svd.U) * mlMatrix.determinant(svd.V) < 0) {
                const lastIndex = mirrorIdentityForSolution.length - 1;
                mirrorIdentityForSolution[lastIndex] = -1;
            }
        }
    }

    return {
        svd: svd,
        mirrorIdentityForErrorBound: mirrorIdentityForErrorBound,
        mirrorIdentityForSolution: mirrorIdentityForSolution
    }
}

/**
 * Solves a Multidimensional Scaling problem with gradient descent.
 *
 * If momentum is != 0, the update is:
 *
 * accumulation = momentum * accumulation + gradient
 * parameters -= learning_rate * accumulation
 *
 * like in TensorFlow and PyTorch.
 *
 * In the returned object, coordinates is a matrix of shape (n, 2) containing
 * the solution.
 *
 * @param {!mlMatrix.Matrix} distances - matrix of shape (n, n) containing the
 *   distances between n points.
 * @param {!number} lr - learning rate to use.
 * @param {!number} maxSteps - maximum number of update steps.
 * @param {!number} minLossDifference - if the absolute difference between the
 *   losses of the current and the previous optimization step is smaller than
 *   this value, the function will return early.
 * @param {!number} momentum - momentum of the gradient descent. Set this value
 *   to zero to disable momentum.
 * @param {!number} logEvery - if larger than zero, this value determines the
 *   steps between logs to the console.
 * @returns {{coordinates: mlMatrix.Matrix, lossPerStep: number[]}}
 */
function getMdsCoordinatesWithGradientDescent(distances,
                                              {
                                                  lr = 1,
                                                  maxSteps = 200,
                                                  minLossDifference = 1e-7,
                                                  momentum = 0,
                                                  logEvery = 0
                                              } = {}) {
    const numCoordinates = distances.rows;
    let coordinates = getInitialMdsCoordinates(numCoordinates);

    const lossPerStep = [];
    let accumulation = null;

    for (let step = 0; step < maxSteps; step++) {
        const loss = getMdsLoss(distances, coordinates);
        lossPerStep.push(loss);

        // Check if we should early stop.
        if (lossPerStep.length > 1) {
            const lossPrev = lossPerStep[lossPerStep.length - 2];
            if (Math.abs(lossPrev - loss) < minLossDifference) {
                return {coordinates: coordinates, lossPerStep: lossPerStep};
            }
        }

        if (logEvery > 0 && step % logEvery === 0) {
            console.log(`Step: ${step}, loss: ${loss}`);
        }

        // Apply the gradient for each coordinate.
        for (let coordIndex = 0; coordIndex < numCoordinates; coordIndex++) {
            const gradient = getGradientForCoordinate(
                distances, coordinates, coordIndex);

            if (momentum === 0 || accumulation == null) {
                accumulation = gradient;
            } else {
                accumulation = mlMatrix.Matrix.add(
                    mlMatrix.Matrix.mul(accumulation, momentum),
                    gradient
                );
            }
            const update = mlMatrix.Matrix.mul(accumulation, lr);
            const updatedCoordinates = mlMatrix.Matrix.sub(
                coordinates.getRowVector(coordIndex),
                update);
            coordinates.setRow(coordIndex, updatedCoordinates);
        }
    }

    return {coordinates: coordinates, lossPerStep: lossPerStep};
}

/**
 * Returns the gradient of the loss w.r.t. to a specific point in the
 * given solution.
 *
 * The returned matrix has the shape (1, d), where d is the number of
 * dimensions.
 *
 * @param {!mlMatrix.Matrix} distances - matrix of shape (n, n) containing the
 *   distances between n points, defining the MDS problem.
 * @param {!mlMatrix.Matrix} coordinates - a matrix of shape (n, d) containing
 *   the current solution, where d is the number of dimensions.
 * @param {!number} coordIndex - index of the point for which the gradient
 *   shall be computed.
 * @returns {mlMatrix.Matrix}
 */
function getGradientForCoordinate(distances, coordinates, coordIndex) {
    const coord = coordinates.getRowVector(coordIndex);
    const normalizer = Math.pow(coordinates.rows, 2);
    let gradient = mlMatrix.Matrix.zeros(1, coord.columns);

    for (let otherCoordIndex = 0;
         otherCoordIndex < coordinates.rows;
         otherCoordIndex++) {

        if (coordIndex === otherCoordIndex) continue;

        const otherCoord = coordinates.getRowVector(otherCoordIndex);
        const squaredDifferenceSum = mlMatrix.Matrix.sub(
            coord, otherCoord).pow(2).sum();
        const predicted = Math.sqrt(squaredDifferenceSum);
        const targets = [
            distances.get(coordIndex, otherCoordIndex),
            distances.get(otherCoordIndex, coordIndex)
        ];

        for (const target of targets) {
            const lossWrtPredicted = -2 * (target - predicted) / normalizer;
            const predictedWrtSquaredDifferenceSum = 0.5 / Math.sqrt(squaredDifferenceSum);
            const squaredDifferenceSumWrtCoord = mlMatrix.Matrix.mul(
                mlMatrix.Matrix.sub(coord, otherCoord), 2);
            const lossWrtCoord = mlMatrix.Matrix.mul(
                squaredDifferenceSumWrtCoord,
                lossWrtPredicted * predictedWrtSquaredDifferenceSum
            );
            gradient = mlMatrix.Matrix.add(gradient, lossWrtCoord);
        }
    }

    return gradient;
}

/**
 * Initializes the solution by sampling from a uniform distribution, which
 * only allows distances in [0, 1].
 *
 * @param {!number} numCoordinates - the number of points in the solution.
 * @param {!number} dimensions - the number of dimensions of each point.
 * @param {!number} seed - seed for the random number generator.
 * @returns {mlMatrix.Matrix}
 */
function getInitialMdsCoordinates(numCoordinates, dimensions = 2, seed = 0) {
    const randomUniform = mlMatrix.Matrix.rand(
        numCoordinates, dimensions, {random: new Math.seedrandom(seed)});
    return mlMatrix.Matrix.div(randomUniform, Math.sqrt(dimensions));
}

/**
 * Returns the loss of a given solution to the Multidimensional Scaling
 * problem by computing the mean squared difference between target distances
 * and distances between points in the solution.
 *
 * @param {!mlMatrix.Matrix} distances - matrix of shape (n, n) containing the
 *   distances between n points, defining the MDS problem.
 * @param {!mlMatrix.Matrix} coordinates - a matrix of shape (n, d) containing
 *   the solution, for example given by getMdsCoordinatesWithGaussNewton(...).
 *   d is the number of dimensions.
 * @returns {number}
 */
function getMdsLoss(distances, coordinates) {
    // Average the squared differences of target distances and predicted
    // distances.
    let loss = 0;
    const normalizer = Math.pow(coordinates.rows, 2);
    for (let coordIndex1 = 0;
         coordIndex1 < coordinates.rows;
         coordIndex1++) {

        for (let coordIndex2 = 0;
             coordIndex2 < coordinates.rows;
             coordIndex2++) {

            if (coordIndex1 === coordIndex2) continue;

            const coord1 = coordinates.getRowVector(coordIndex1);
            const coord2 = coordinates.getRowVector(coordIndex2);
            const target = distances.get(coordIndex1, coordIndex2);
            const predicted = mlMatrix.Matrix.sub(coord1, coord2).norm();
            loss += Math.pow(target - predicted, 2) / normalizer;
        }
    }
    return loss;
}

/**
 * Solves a Multidimensional Scaling problem with the Gauss-Newton algorithm.
 *
 * In the returned object, coordinates is a matrix of shape (n, 2) containing
 * the solution.
 *
 * @param {!mlMatrix.Matrix} distances - matrix of shape (n, n) containing the
 *   distances between n points.
 * @param {!number} lr - learning rate / alpha to use.
 * @param {!number} maxSteps - maximum number of update steps.
 * @param {!number} minLossDifference - if the absolute difference between the
 *   losses of the current and the previous optimization step is smaller than
 *   this value, the function will return early.
 * @param {!number} logEvery - if larger than zero, this value determines the
 *   steps between logs to the console.
 * @returns {{coordinates: mlMatrix.Matrix, lossPerStep: number[]}}
 */
function getMdsCoordinatesWithGaussNewton(distances,
                                          {
                                              lr = 0.1,
                                              maxSteps = 200,
                                              minLossDifference = 1e-7,
                                              logEvery = 0
                                          } = {}) {
    const numCoordinates = distances.rows;
    let coordinates = getInitialMdsCoordinates(numCoordinates);
    const dimensions = coordinates.columns;

    const lossPerStep = [];

    for (let step = 0; step < maxSteps; step++) {
        const loss = getMdsLoss(distances, coordinates);
        lossPerStep.push(loss);

        // Check if we should early stop.
        if (lossPerStep.length > 1) {
            const lossPrev = lossPerStep[lossPerStep.length - 2];
            if (Math.abs(lossPrev - loss) < minLossDifference) {
                return {coordinates: coordinates, lossPerStep: lossPerStep};
            }
        }

        if (logEvery > 0 && step % logEvery === 0) {
            console.log(`Step: ${step}, loss: ${loss}`);
        }

        // Apply the update.
        const {residuals, jacobian} = getResidualsWithJacobian(
            distances, coordinates);
        const update = mlMatrix.pseudoInverse(jacobian).mmul(residuals);
        for (let coordIndex = 0; coordIndex < numCoordinates; coordIndex++) {
            for (let dimension = 0; dimension < dimensions; dimension++) {
                const updateIndex = coordIndex * dimensions + dimension;
                const paramValue = coordinates.get(coordIndex, dimension);
                const updateDelta = lr * update.get(updateIndex, 0);
                const updatedValue = paramValue - updateDelta;
                coordinates.set(coordIndex, dimension, updatedValue);
            }
        }
    }

    return {coordinates: coordinates, lossPerStep: lossPerStep};
}

/**
 * Returns the residuals and the Jacobian matrix for performing one step of the
 * Gauss-Newton algorithm.
 *
 * The residuals are returned in a flattened vector as (target - predicted) /
 * numCoordinates. The flattened vector is ordered based on iterating the
 * matrix given by distances in row-major order. We divide by coordinates.rows,
 * so that the sum of squared residuals equals the MDS loss, which involves a
 * division by coordinates.rows ** 2.
 *
 * The element of the Jacobian at row i and column j should contain the
 * partial derivative of the i-th residual w.r.t. the j-th coordinate. The
 * coordinates are indexed in row-major order, such that in two dimensions,
 * the 5th zero-based index corresponds to the second coordinate of the third
 * point.
 *
 * @param {!mlMatrix.Matrix} distances - matrix of shape (n, n) containing the
 *   distances between n points, defining the MDS problem.
 * @param {!mlMatrix.Matrix} coordinates - a matrix of shape (n, d) containing
 *   the current solution, where d is the number of dimensions.
 * @returns {{jacobian: mlMatrix.Matrix, residuals: mlMatrix.Matrix}}
 */
function getResidualsWithJacobian(distances, coordinates) {
    const residuals = [];
    const numCoordinates = coordinates.rows;
    const dimensions = coordinates.columns;
    const jacobian = mlMatrix.Matrix.zeros(
        numCoordinates * numCoordinates,
        numCoordinates * dimensions);

    for (let coordIndex1 = 0;
         coordIndex1 < numCoordinates;
         coordIndex1++) {

        for (let coordIndex2 = 0;
             coordIndex2 < numCoordinates;
             coordIndex2++) {

            if (coordIndex1 === coordIndex2) {
                residuals.push(0);
                // The gradient for all coordinates is zero, so we can skip
                // this row of the Jacobian.
                continue;
            }

            // Compute the residual.
            const coord1 = coordinates.getRowVector(coordIndex1);
            const coord2 = coordinates.getRowVector(coordIndex2);
            const squaredDifferenceSum = mlMatrix.Matrix.sub(
                coord1, coord2).pow(2).sum();
            const predicted = Math.sqrt(squaredDifferenceSum);
            const target = distances.get(coordIndex1, coordIndex2);
            const residual = (target - predicted) / numCoordinates;
            residuals.push(residual);

            // Compute the gradient w.r.t. the first coordinate only. The
            // second coordinate is seen as a constant.
            const residualWrtPredicted = -1 / numCoordinates;
            const predictedWrtSquaredDifferenceSum = 0.5 / Math.sqrt(squaredDifferenceSum);
            const squaredDifferenceSumWrtCoord1 = mlMatrix.Matrix.mul(
                mlMatrix.Matrix.sub(coord1, coord2), 2);
            const residualWrtCoord1 = mlMatrix.Matrix.mul(
                squaredDifferenceSumWrtCoord1,
                residualWrtPredicted * predictedWrtSquaredDifferenceSum
            );

            // Set the corresponding indices in the Jacobian.
            const rowIndex = numCoordinates * coordIndex1 + coordIndex2;
            for (let dimension = 0; dimension < dimensions; dimension++) {
                const columIndex = dimensions * coordIndex1 + dimension;
                const jacobianEntry = jacobian.get(rowIndex, columIndex);
                const entryUpdated = jacobianEntry + residualWrtCoord1.get(
                    0, dimension);
                jacobian.set(rowIndex, columIndex, entryUpdated);
            }
        }
    }
    return {
        residuals: mlMatrix.Matrix.columnVector(residuals),
        jacobian: jacobian
    };
}

/**
 * Initializes the solution by sampling from a uniform distribution, which
 * only allows distances in [0, 1].
 *
 * @param {!number} numCoordinates - the number of points in the solution.
 * @param {!number} dimensions - the number of dimensions of each point.
 * @param {!number} seed - seed for the random number generator.
 * @returns {mlMatrix.Matrix}
 */
function getInitialMdsCoordinates(numCoordinates, dimensions = 2, seed = 0) {
    const randomUniform = mlMatrix.Matrix.rand(
        numCoordinates, dimensions, {random: new Math.seedrandom(seed)});
    return mlMatrix.Matrix.div(randomUniform, Math.sqrt(dimensions));
}

/**
 * Returns the loss of a given solution to the Multidimensional Scaling
 * problem by computing the mean squared difference between target distances
 * and distances between points in the solution.
 *
 * @param {!mlMatrix.Matrix} distances - matrix of shape (n, n) containing the
 *   distances between n points, defining the MDS problem.
 * @param {!mlMatrix.Matrix} coordinates - a matrix of shape (n, d) containing
 *   the solution, for example given by getMdsCoordinatesWithGaussNewton(...).
 *   d is the number of dimensions.
 * @returns {number}
 */
function getMdsLoss(distances, coordinates) {
    // Average the squared differences of target distances and predicted
    // distances.
    let loss = 0;
    const normalizer = Math.pow(coordinates.rows, 2);
    for (let coordIndex1 = 0;
         coordIndex1 < coordinates.rows;
         coordIndex1++) {

        for (let coordIndex2 = 0;
             coordIndex2 < coordinates.rows;
             coordIndex2++) {

            if (coordIndex1 === coordIndex2) continue;

            const coord1 = coordinates.getRowVector(coordIndex1);
            const coord2 = coordinates.getRowVector(coordIndex2);
            const target = distances.get(coordIndex1, coordIndex2);
            const predicted = mlMatrix.Matrix.sub(coord1, coord2).norm();
            loss += Math.pow(target - predicted, 2) / normalizer;
        }
    }
    return loss;
}

Notes

Transit vs. Driving: By a small modification of the JavaScript code, you can also calculate the driving durations (set TRAVEL_MODE to google.maps.TravelMode.DRIVING). I tried it out and the resulting map differed only slightly from the real map. So, driving durations seem to generally scale proportionally to distances between cities.
Credits to Ben Frederickson for providing insights and the JavaScript code used for classical Multidimensional Scaling.

Like what you read?

I don't use Google Analytics or Disqus because they require cookie popups. I would still like to know which posts are popular, so if you liked this post you can let me know here on the right.

You can also leave a comment if you have a GitHub account. The "Sign in" button will store the GitHub API token in a cookie.