### Multiplex transportation model

We consider a multiplex network composed of a slow layer and a fast layer (see Fig. 1). We denote with \(\mathcalG\) the set of nodes in the slow layer, and with \(\mathcalS\) the set of its edges; \({{{{\mathcalH}}}}\) and \(\mathcalF\) are respectively the set of nodes and edges in the fast layer. Both layers contain *N* nodes; each node *n* in the slow layer has a one-to-one correspondence with a node *F*(*n*) in the fast layer. Each edge (*F*(*n*), *F*(*m*)) in the fast layer has a replica edge (*n*, *m*) in the slow layer, but the vice versa is not necessarily true. The transit time of each edge in the slow layer equals one, whereas time required to traverse edges of the fast layer is reduced by a factor 0≤*η*≤1. Replica nodes are connected to each other by edges whose transit time is *c*≥0. Please note that in this mathematical framework entities have no physical meaning, thus we can interchange the notions of the length of an edge with that of the time required to traverse it, and simply refer to them with the generic term cost.

When considered in isolation, the slow layer forms a single connected component, whereas the fast layer is not necessarily connected. The connectedness of the slow layer implies, however, that in the overall system, composed of the interconnected slow and fast layers, there exists at least a path connecting any pair of nodes. The cost of a path in the network is given by the sum of the costs of all edges that compose the path. The minimum-cost path between two nodes can either use edges in the slow layer only or take advantage of some of the edges in the fast layer (see Fig. 1). In particular, the path *n* → *F*(*n*) → *F*(*m*) → … → *F*(*r*) → *F*(*s*) → *s* composed of *ℓ* edges in the fast layer only is preferred to its replica path *n* → *m* → … → *r* → *s* whenever *ℓ* is larger than

$$r_c=\frac2c1-\eta \,.$$

(3)

We identify a special node *o* in the slow layer of the network, i.e., the center of the network. We then define the weighted average cost to the center as in Eq. (1). We stress that the objective function of Eq. (1) is computed over all nodes in the slow layer only, but eventual minimum-cost paths can take advantage of edges in the fast layer. Clearly, *τ* depends on the various parameters of the model. In Eq. (1), we explicit, on purpose, only the dependence of *τ* on the fast layer \(\mathcalF\) as this is the primary object of our investigation. We consider in fact the optimization problem aimed at finding the best set of edges in the fast layer able to minimize the objective function of Eq. (1). The minimization is constrained by the number of edges *L* that are in the fast layer, with *L* still measured in the same units of costs as *τ* and *c*. Specifically, we aim at solving

$$\mathcalF^*=\arg \min _{| \mathcalF|=L}\,\tau ({{{{{{\mathcalF}}}}}}),$$

(4)

where we indicated with \(| {{{{{{{\mathcalF}}}}}}}|\) the number of edges in \({{{{{{{\mathcalF}}}}}}}\).

Finding the exact solution to the optimization problem of Eq. (4) is computationally infeasible as it requires a brute-force search over all possible \(\left(\beginarrayc| {{{{{\mathcalS}}}}}| \\ L\endarray\right)\) configurations of the fast layer. In the SM, we prove, however, that: (i) the optimal configuration of the fast layer is a connected tree with at least one edge incident to *F*(*o*), i.e., the replica node of the center; (ii) the objective function of Eq. (1) is a decreasing and submodular function. The relevance of property (i) is two-fold: first, it allows us to dramatically reduce the number of suitable solutions for the optimization problem of Eq. (4); second, it permits us to meaningfully describe the geometry of the optimal fast layer in terms of the number of branches departing from the replica node of the center, i.e.,

$$k^*=\mathop\sum_{(F(n),F(m))\in {{{{{{{{\mathcalF}}}}}}}}^*}\,\left[\delta _F(o),F(m)+\delta _F(n),F(o)\right],$$

(5)

where *δ*_{x,y} = 1 if *x* = *y* and *δ*_{x,y} = 0 otherwise. Property (ii) allows us to leverage a greedy optimization scheme to generate approximate solutions to the optimization problem of Eq. (4) that are at most a fraction (1 − 1/*e*) above the ground-truth minimum^{44}. In the construction of greedy solutions, we start from an empty set of edges in the fast layer, and we add one edge at a time. The edge that is added is the one corresponding to the best choice that can be made given the current set of edges in the fast layer. In the SM, we further describe how solutions obtained via greedy optimization can be further refined to get better approximations for the optimization problem of Eq. (4); the quasi-optimality of our greedy solutions is validated by comparing them to those obtained via simulated-annealing optimization (see Fig. 3 and SM for details).

### Two-dimensional triangular lattices

The slow layer used in the definition of the multiplex transportation model can be represented by any connected network. In the SM for example, we report on results obtained for a slow layer given by an instance of an Erdős-Rényi model.

The vast majority of the results reported in this paper are obtained for slow layers derived from triangular lattices. The coordinates of all sites of the lattice are in the form \((a+\fracb2,\frac\sqrt3b2)\) for integer values of *a* and *b* such that ∣*a*∣ + ∣*b*∣≤*R*, where *R* is the radius of the triangular lattice. The boundary conditions give the system a hexagonal shape. The sites of the lattice are the nodes of the slow layer; each pair of nodes in the slow layer is connected if the corresponding sites are at distance one in the triangular lattice; we identify the center of the network as the site with coordinates (0, 0), i.e., *a* = *b* = 0.

### Real-world cities

In this section, we describe how we model the transportation system of a real city. Our framework relies on the use of a multiplex network formed of two discrete triangular lattices, one used to describe slow transportation (e.g., cars) and the other used to model fast transportation (e.g., subways). As we are referring to a real physical system, all quantities that describe properties of the multiplex transportation model have an associated physical dimension. For simplicity, we still rely on the same notation as in the previous sections, however, we add a tilde on the top of the symbols to make clear that the notation is used to indicate physical quantities. For example, we use \(\tildeR\) to denote the city radius measured in units of length and distinguish it from *R* which serves to indicate the radius of the triangular lattice measured in dimensionless lattice units.

We obtain the population density at the census-tract level for Boston and Atlanta from the 2021 American Census Survey, and for Toronto from the 2021 Canadian Census of Population. The census data contains the number of individuals residing in relatively small geographic regions, i.e., census tracts, used for statistical purposes by national statistical agencies. Census tracts typically consist of 2500 to 8000 individuals.

The data on the four metro lines and their stations in Boston is obtained from the Massachusetts Bureau of Geographic Information (MassGIS) website. Similar data for the four metro lines in Atlanta is obtained from the Atlanta Regional Commission: Open Data website. Finally, the data for the three subway lines and their stations in Toronto is made available by the Toronto Transit Commission at the City of Toronto Open Data website. The data on the subway lines is available as shapefiles. The arcs for the rail lines are given by sets of points in the coordinate reference systems (CRS) applicable to the geographic location. The CRSs used for Atlanta, Boston, and Toronto are EPSG:2239, EPSG:26986, and, EPSG:2952, respectively. Similarly, the location of the stations is given by points in the appropriate CRS. The Atlanta, Boston, and Toronto subway systems total \(\tildeL=77\) km, \(\tildeL=109.6\) km, and \(\tildeL=69.6\) km of rail lines and \(\tilden_s=38,\tilden_s=125\), and \(\tilden_s=75\) stations, respectively. The average distance between the stations is 2.07 km, 0.88 km, and 0.94 km for Atlanta, Boston, and Toronto, respectively.

We choose a city radius \(\tildeR\) such that all stations are contained in the circle of radius \(\tildeR\) around the center. We find the appropriate choice for this radius to be \(\tildeR=25\) km in Atlanta and \(\tildeR=20\) km in Boston and Toronto. We assume that everything that lies inside this circular area constitutes the city.

We create a lattice model of the transportation system in a city by overlaying a triangular lattice of radius *R* on top of the circle of radius \(\tildeR\). Please note that we use *R* = 100 in all figures except for Fig. 5 where we use *R* = 25. The operation requires matching locations of the city that are given by points in continuous space to lattice sites. To this end, we fix the location of the city center as the one of the center *o* of the triangular lattice. Since the lattice covers the entire circular area of the city, the physical distance between neighboring sites on the lattice is \(\tildew_n,m=\tildeR/R\). The choice of *R*, for given \(\tildeR\), determines the granularity of the lattice mesh overlaid on the city landscape. For instance, *R* = 100 and \(\tildeR=25\) km give us neighboring lattice sites *n* and *m* at distance \(\tildew_n,m=0.25\) km, whereas *R* = 25 and \(\tildeR=25\) km give \(\tildew_n,m=1\) km. Note that the physical distance between nodes *n* and *m* in the two layers is the same as the physical distance between their replica nodes *F*(*n*) and *F*(*m*) in the fast layer, i.e., for all \((F(n),F(m))\in {{{{{{{\mathcalF}}}}}}}\,\tildew_F(n),F(m)=\tildew_n,m\). The distance between the replica nodes *n* and *F*(*n*) is a parameter of the model \(\tildew_n,F(n)=\tilde\ell \).

The weight \(\tildep_n\) associated with node *n* in the slow layer is given by the population density of the associated census tract containing the site. Note that the size of census tracts varies significantly as the population density varies. Consequently, depending on the choice of *R* and \(\tildeR\), we may have no lattice sites in very small census tracts. We find that this issue can be resolved by choosing *R* = 100 for the values of \(\tildeR\) indicated above.

We assume that the travel speed \(\tildev_s\) on the slow layer is between 5 km h^{−1} and 20 km h^{−1}, and the speed on the fast layer is \(\tildev_f=40\,\,\mboxkm\,\,\mboxh^-1\). These are both realistic (ranges of) values for the travel speeds of cars and subways, respectively. Clearly, we have \(\eta=\tildev_s/\tildev_f\). We note that the time required to traverse the edge \((n,m)\in {{{{{{{\mathcalS}}}}}}}\) is \(\tildet_n,m=\tildew_n,m/\tildev_s\), whereas the time required to traverse the edge \((F(n),F(m))\in {{{{{{{\mathcalF}}}}}}}\) is \(\tildet_F(n),F(m)=\eta \,\tildet_n,m\). For example, if \(R=25,\tildeR=25\) km, \(\tildev_s=20\,\,\mboxkm\,\,\mboxh^-1\), and \(\tildev_f=40\,\,\mboxkm\,\,{{\mboxh}}^-1\), we have \(\tildet_n,m=\frac120\) hours or 3 min, and \(\tildew_F(n),F(m)=\frac140\) h or 1.5 min. Finally, we assume that a change of layers occurs also at speed \(\tildev_s\), meaning that the time required to switch layers is \(\tildet=\tilde\ell /\tildev_s\). In our simulations, we impose \(\tildet=3\) mins. This choice means that 6 mins are required to change mode of transportation, since the actual cost associated to the use of the fast layer is \(2\tildet\).

We denote with \(\tildet_n\) the time required to reach the center *o* from node *n*. This is given by the time corresponding to the fastest path connecting the two nodes. We finally rewrite Eq. (1) as

$$\tilde\tau ({{{{{{{\mathcalF}}}}}}})=\frac{\sum _{n\in {{{{{\mathcalG}}}}}}\,\tildet_n\,\tildep_n}{\sum _{n\in {{{{{{{\mathcalG}}}}}}}}\,\tildep_n},$$

(6)

and use this expression while solving the optimization problem of Eq. (4).

Next, we explain the modeling framework used to obtain the results for the real subway systems in Fig. 5. The slow layer is modeled identically as described above. We use \(\tilden_s\) stations as the nodes of the fast layer. Connections between stations are given by actual railways, with length learned directly from the data. For each station, we identify its replica in the slow layer as the node corresponding to the closest (in terms of geographical distance) site of the triangular lattice.

To properly compare the objective function of Eq. (6) for the real city against the one obtained after greedy optimization, we set *R* = 25 when constructing the triangular lattice. This allows us to obtain comparable numbers of subway stations between the real cities and the synthetic ones. We respectively have *L* = 77, 132, and 87 for the synthetic versions of Atlanta, Boston, and Toronto.

### Continuous-space approximation for one-dimensional lattices

To ease analytical calculations, we adopt a continuous-space approximation for a multiplex transportation model where the slow layer is given by a one-dimensional lattice. Here for simplicity, we assume that the weight associated to each node *n* in the slow layer is *p*_{n} = const. In the SM, we prove that the symmetry breaking occurs for arbitrary non-negative functions *p*_{n} that are symmetric with respect to the center of the lattice. Under the continuous-space approximation, the slow layer has the center located in the origin, and is formed of two segments of length *R* extending symmetrically to the left and the right of the center (see Fig. 2a). The fast layer extends to the right of the origin with a segment of length *α**L* and to the left with a segment of length (1 − *α*)*L*, where 0≤*α*≤1/2 is a tunable parameter and *L*≤*R* is the total length of the fast layer. The goal of our calculation is to find the value *α*^{*} corresponding to the optimal configuration of the fast layer, i.e., the solution of the continuous-space approximation of the optimization problem of Eq. (4).

Consider first the right side of the fast layer which is serving the right side of the slow layer. The objective function relative to the right side of the system is

$$\tau _\rmright(\alpha )=\left\{\beginarrayllR^2/2 \hfill &\,\mboxif\,\,0 \, \le \, \alpha \, \le \, r_c/L\\ C+(1-\eta )\, L\left[\alpha ^2L/2-\alpha R\right]&\,\mboxoth.\hfill\,\endarray\right.,$$

(7)

where

$$C=\frac12(1-\eta )r_c^2+2c(R-r_c)+\frac12R^2\,.$$

(8)

If *α**L* < *r*_{c}, the fast layer does not serve any portion of the slow layer, thus \(\tau _{{{{{{{\rmright}}}}}}}(\alpha )=\int\nolimits_0^R\,dx\,x=R^2/2\). If *α**L*≥*r*_{c}, we need to solve the integral \(\tau _{{{{{{{{\rmright}}}}}}}}(\alpha )=\int\nolimits_0^r_c\,dx\,x+\int\nolimits_r_c^\alpha L\,dx\,\left(2c+\eta x\right)+\int\nolimits_\alpha L^R\,dx\,\left(2c-(1-\eta )\alpha L+x\right)\), leading to the second case of Eq. (7). We note that the term *C* appearing in Eq.(8) does not depend on *α*, but only on *r*_{c} and *R*.

For the left portion of the fast layer, we simply have *τ*_{left}(*α*) = *τ*_{right}(1 − *α*). For the entire system, the objective function reads *τ*(*α*) = *τ*_{right}(*α*) + *τ*_{left}(*α*).

We now distinguish two cases: (i) *r*_{c}/*L* ≤ 1/2 and (ii) *r*_{c}/*L* ≥ 1/2. In case (i), we can write:

$$\tau (\alpha )=\left\{\beginarrayllR^2/2+C+(1-\eta )\, L\left[(1-\alpha )^2L/2-(1-\alpha )R\right]&\,\mboxif\,\,0\le \alpha \le r_c/L \\ 2C+(1-\eta )\, L\left[((1-\alpha )^2+\alpha ^2)L/2-R\right]\hfill&\,\mboxif\,\,r_c/L\le \alpha \le 1/2.\endarray\right.\,$$

(9)

thus,

$$\fracd\tau (\alpha )d\alpha =\left\{\beginarrayll(1-\eta )\,L\left(R-(1-\alpha )L\right)\ge 0&\,\mboxif\,\,0\le \alpha \le r_c/L\\ (1-\eta )\,L\left[-L-R\right]\le 0\hfill&\,{\mboxif}\,\,r_c/L\le \alpha \le 1/2.\endarray\right.\,$$

We see therefore that the function reaches its maximum at *α* = *r*_{c}/*L*, and displays its minimum value either in *α* = 0 or *α* = 1/2. To determine where the minimum of the objective function of Eq. (9) is obtained, we need to solve the equation *τ*(*α* = 0) = *τ*(*α* = 1/2). After some simple calculations, we arrive to

$$r^\dagger =R\left[1-\sqrt1-\frac12\left(\fracLR\right)^2\right]\,.$$

(10)

For *r*_{c}≥*r*^{†} the optimal configuration is the one obtained for *α*^{*} = 1/2, whereas for *r*_{c}≤*r*^{†} the optimal configuration is the one corresponding to *α*^{*} = 0.

In case (ii), we can repeat a similar derivation. We find that the maximum of the objective function is reached in *α* = 1 − *r*_{c}/*L*, and the function displays its minimum value either in *α* = 0 or *α* = 1/2. Also, here we find the critical value of Eq. (10) where the optimal configuration of the fast layer changes from being perfectly symmetric to being asymmetric. Alternatively, we can determine the critical length *L*^{†} of the fast layer as shown in Eq. (2).

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link