# Graph-based data-driven methods to design future-generation public transportation

Forum 'Stages' - Sujet créé le 2022-02-11
par
**Dimitri Watel**

Category : Research Internship

Duration : 5-6 months

Contacts : Andrea Araldo (andrea.araldo@telecom-sudparis.eu), Stefania Dumbrava (stefania.dumbrava@ensiie.fr) and Dimitri Watel (dimitri.watel@ensiie.fr)

Location : SAMOVAR Laboratory (Evry / Paris Saclay)

Pay : 600€ / Month

**Context and high level goal**

Current public transportation does not offer an equitable Level of Service (LoS). Indeed, fixed-schedule bus and train lines offer good LoS in city centers, but they impose poor LoS in the suburbs. On the other hand, it is expected that in the near future our cities will be traveled by automated vehicles, serving user requests in an Uber-like fashion. Such a service is referred to as Automated Mobility on Demand (AMoD) [1]. AMoD would be more suited to serve the low demand density of suburbs [9]. Therefore, we believe that, by deploying **AMoD ***together with ***classic fixed-schedule lines**, we may achieve a **more equitable distribution of LoS**. The goal of this internship is to** find the structure of such a future generation ***hybrid*** transit**, combining fixed-schedule lines and AMoD.

**Methodology **

We rely on **open data**, **graph analysis** and **algorithms design**.

We create a graph model of the public transportation of several metropolitan areas, e.g., Île de France, parsing GTFS data [3], a standard open data format detailing the current transit schedules. In such a graph, a train or bus line is a sequence of arcs, labeled with travel times. While the graph obtained by GTFS data only represents current public transportation, made of fixed-schedule lines, we are interested in future mobility, where AMoD is also available. Therefore, we need to augment the graph with analytical models [5] describing AMoD performance. We partition the study area in small cells and calculate on each cell the *accessibility score* [2], which measures the LoS therein, in terms of ease to reach other destinations via public transportation. Including AMoD may change the graph structure and the labels which influences the accessibility score. The main objective of the internship is then to improve this score as much as possible.

One first objective of this internship is to devise fast methods of computing accessibility on a real and large transportation network, which can potentially be composed of millions of arcs. To this aim, we will leverage the storage and management mechanisms of graph databases, such as Neo4j [5], optimized for handling highly interconnected data. As highlighted in [6], they provide a user-friendly approach to visualize, analyze, and transform large scale graphs. Once the transportation graph is loaded in Neo4j, we will extract its syntactical structure [7], and design a suitable abstraction for computing accessibility through lightweight queries. With respect to previous initial work on the topic [8], we will develop a more efficient methodology based on storing and incrementally maintaining intermediate computations, represented as graph views.

Since our goal is to change the graph structure to achieve a more equitable geographical distribution of the accessibility score, we need to evaluate the quality of many modified graph structures, and calculating accessibility scores every time from scratch would be too long. A second objective is thus to design incremental algorithms for accessibility computation, in order to make as few computations as possible every time we change an arc or a label of the network.

The third and last objective is to design optimization algorithms to achieve a distribution of accessibility as equitable as possible (e.g., max-min optimization of accessibility scores), using operations research methods that can benefit from the two previous parts of the internship (like greedy algorithms, branch and bound, local search or more generally black box optimization where the black box is our algorithm measuring the accessibility) and evaluate them on real case instances.

**References**

[1] Basu, R., Araldo, A., Akkinepally, A. P., Nahmias Biran, B. H., Basak, K., Seshadri, R., Deshmukh, N., Kumar, N., Azevedo, C. L., & Ben-Akiva, M. (2018). Automated Mobility-on-Demand vs. Mass Transit: A Multi-Modal Activity-Driven Agent-Based Simulation Approach. Transportation Research Record, 8(2672), 608–618.

[2] Biazzo, I., Monechi, B., & Loreto, V. (2019). General scores for accessibility and inequality measures in urban areas. Royal Society Open Science, 6(8).

[3] https://data.iledefrance.fr/explore/dataset/horaires-prevus-sur-les-lignes-de-transport-en-commun-dile-de-france-gtfs/information/

[4] D. Huang et al., “An analytical model for the many-to-one demand-responsive transit systems”, Sustainability 2020

[5] Sakr, Sherif et al., “The future is big graphs: a community view on graph processing systems”, Communications of the ACM 64(9): 62-71 (2021)

[6] Neo4j Graph Database System, https://neo4j.com/

[7] Angela Bonifati, Stefania Dumbrava, Nicolas Mir, Hierarchical Clustering for Property Graph Schema Discovery, International Conference on Extending Database Technology (EDBT), April 2022.

[8] Fortin, P., Morency, C., & Trépanier, M. (2016). Innovative GTFS data application for transit network analysis using a graph-oriented method. Journal of Public Transportation, 19(4), 18–37. https://doi.org/10.5038/2375-0901.19.4.2

[9] Calabrò, G., Araldo, A., Oh, S., Seshadri, R., Inturri, G., & Ben-Akiva, M. (2021). Integrating Fixed and Demand Responsive Transportation for Flexible Transit Network Design. TRB Annual Meeting.