Overview
DISC (Driving Styles In Simulated Crashes) is a pioneering dataset capturing diverse human driving behaviors in pre-crash scenarios within mixed autonomy settings. Collected via the TRAVERSE VR-based simulator, DISC includes data from hundreds of drivers facing rare-event traffic scenarios in a virtual city. This dataset supports the classification and prediction of driving behaviors, bridging the gap in human-centric data and enhancing autonomous vehicle safety and response in interactions with human drivers.
Authors
- Sandeep Thalapanane
- Sandip Sharan Senthil Kumar
- Guru Nandhan A D P
- Sourang SriHari
- Laura Zheng
- Ming Lin
Abstract
Handling pre-crash scenarios is still a major challenge for self-driving cars due to limited practical data and human-driving behavior datasets. We introduce DISC (Driving Styles In Simulated Crashes), one of the first datasets designed to capture various driving styles and behaviors in pre-crash scenarios for mixed autonomy analysis. DISC includes over 8 classes of driving styles/behaviors from hundreds of drivers navigating a simulated vehicle through a virtual city, encountering rare-event traffic scenarios. This dataset enables the classification of pre-crash human driving behaviors in unsafe conditions, supporting individualized trajectory prediction based on observed driving patterns. By utilizing a custom-designed VR-based in-house driving simulator, TRAVERSE, data was collected through a driver-centric study involving human drivers encountering twelve simulated accident scenarios. This dataset fills a critical gap in human-centric driving data for rare events involving interactions with autonomous vehicles. It enables autonomous systems to better react to human drivers and optimize trajectory prediction in mixed autonomy environments involving both human-driven and self-driving cars. In addition, individual driving behaviors are classified through a set of standardized questionnaires, carefully designed to identify and categorize driving behavior traits. We correlate data features with driving behaviors, showing that the simulated environment reflects real-world driving styles. DISC is the first dataset to capture how various driving styles respond to accident scenarios, offering significant potential to enhance autonomous vehicle safety and driving behavior analysis in mixed autonomy environments.
Contents
- Dataset Description
- Dataset Structure
- Author Statement
- Scenarios
- Radar Plot Visualization
- Scenario Correlations
- Demographics Data
- Scenario Visualization
- SUMO FCD Data Visualization
- Random Forest Classification
Dataset Description
The dataset will be available in the Hugging Face datasets hub along with the documentation. This will enable us to preserve the data for future use and maintain multiple versions of the datasets. The APIs of the Hugging Face will make it easy to access the datasets without storing them. Hugging Face provides users with free access to the data. The dataset will be permanently hosted on the Hugging Face Datasets Hub, ensuring long-term availability. The link will be made public after the review process.We are committed to ensuring the dataset’s availability indefinitely by leveraging Hugging Face’s infrastructure for data preservation and accessibility.
Languages
The dataset contains English text only.
Documentation
The documentation will also be made available in the Hugging Face repository along with the link to the code.
Dataset Structure
The dataset contains 1,207 trajectories of 107 users. There will be four folders, namely Sensory data, Floating Car Data (FCD) data, parsed Floating Car Data (FCD) data (i.e. timestampedgeo-localization and speed data directly collected by moving vehicles), and accident data (namedTrimmed_Data). The sensory data comprises a directory for each user, within which separatesubdirectories contain sensory data (eye-tracking data, ego vehicle position data, deer, and pedestrianposition data) for each specific scenario. All files within the sensory data directory are formatted ascomma-separated values (CSV). The Floating Car Data (FCD) follows a similar structure, with eachsubdirectory containing an XML file for each scenario, encapsulating data for all vehicles involved.This data includes x, and y coordinates, speed, angle, position, and lane ID for each vehicle. TheFCD data is further processed and stored in an Excel (.xlsx) file format, where each sheet withinthe file represents data from different traffic vehicles, including the ego vehicle. Additionally, theTrimmed_Data directory specifically contains data for the ego vehicle during keyframe moments,particularly focusing on collision events. This dataset is truncated to include several timestampsbefore and after the collision, allowing for analysis of user reactions to stressor events.
Apart from these dataset folders, we also have two additional files: the participants responses to the MDSI questionnaire and the resulting driving style vector labels of each participant after processing the responses. The codes for processing the responses and parsing the FCD data are also provided in the scripts folder of the dataset above.
License
The dataset is released under the Creative Commons Attribution 4.0 International License and will be publicly available.
Dataset Curations
The dataset was collected from the users of our VR driving simulators with IRB approval and created by the authors.
Author statement
We, as the authors of the dataset titled “DISC,” declare that:- We bear all responsibility for the content and any rights violations, including copyright and privacy.
- The dataset complies with all ethical guidelines and legal regulations.
- Data collection involved informed consent and anonymization where applicable. The dataset is accurate and complete to the best of our knowledge.
- All sources and contributors are properly acknowledged.
Scenarios
1. Sudden Lane Intercept: Unforeseen lane changes without prior signaling can cause potential collisions. This situation replicates a sudden lane change by a nearby vehicle, requiring the participant to respond swiftly.
2. Crash at T-Bone Intersection: A vehicular collision wherein the front end of one vehicle collides with the side of another, typically resulting in substantial damage and injuries due to the perpendicular nature of the impact.
3. Sudden Vehicle Stop: An unexpected halt, whether due to a sudden obstacle or an emergency, necessitates quick responses from trailing drivers to prevent rear-end collisions.
4. Running Red Lights: Vehicles ignoring traffic signals and entering intersections at a red light significantly increase the risk of collisions with other vehicles or pedestrians.
5. Deer Crossing: Facing an unexpected animal crossing, such as a deer, requires rapid decision-making to prevent accidents and protect both passengers and wildlife.
6. Crash at Roundabouts: Accidents in roundabouts often result from incorrect yielding or improper lane changes, emphasizing the need to understand and follow roundabout rules and yielding protocols.
7. Crash at Ramp Mergers: Crashes frequently occur at highway on-ramps due to improper merging or failure to yield, highlighting the necessity for careful merging techniques.
8. Jaywalking Pedestrians: Pedestrians crossing streets outside of marked crosswalks raise the risk of accidents and require drivers to stay alert for sudden pedestrian movements.
9. Lane Shifting Behavior: Interstate highways often comprise multiple lanes, making lane shifting imperative. This scenario tests the driver’s choice in shifting lanes at the appropriate time.
10. Compliance to Yellow Light: Evaluates compliance with traffic signals by deliberately changing a green light to yellow while the participant is attempting to cross the intersection.
11. Slow Car Encounter: Simulates the unpredictable behaviors of other vehicles, such as encountering a slow-moving vehicle ahead, and evaluates the driver’s decision-making process and patience.
12. Crash at Zipper Lane Merge: Assesses participants proficiency in merging lanes and their attentiveness to signage and lane markings in complex road configurations.
Radar Plot Visualization
The plot displays the driving style distribution of six randomly selected participants for each driving style. From left to right: Driving style - Angry, Careful, High Velocity, Risky, Patient, and Distress Reduction.





Scenario Correlations
We illustrate the statistical correlation and its significance between the driving styles from MDSI and sensor measurements from our dataset for some example pre-crash scenarios in the below tables.Table 1: Pearson and Spearman correlation table between driving style components from MDSI and sensor measurements for Scenario 5 - Sudden Deer Crossing
DRIVING STYLE | Coefficient | SENSOR MEASUREMENTS | |||||
---|---|---|---|---|---|---|---|
ΣMagnitude | ΣAcceleration | ΣSpeed | ΣSteering Angle | ΣLane | ΣJerk | ||
Dissociative | rPearson | 0.0022 | 0.0009 | 0.0413 | 0.0001 | NaN | -0.0004 |
rSpearman | 0.0227 | 0.0107 | 0.0250 | 0.0323 | NaN | 0.0005 | |
Anxious | rPearson | -0.0071 | -0.0000 | -0.1266 | -0.0003 | NaN | 0.0013 |
rSpearman | -0.1560 | -0.0081 | -0.1642 | -0.0785 | NaN | 0.0046 | |
Risky | rPearson | 0.0133 | -0.0002 | 0.2342 | 0.0005 | NaN | -0.0027 |
rSpearman | 0.2266 | 0.0209 | 0.2407 | 0.1185 | NaN | -0.0089 | |
Angry | rPearson | 0.0027 | 0.0002 | 0.0493 | 0.0001 | NaN | -0.0005 |
rSpearman | 0.0272 | -0.0136 | 0.0309 | 0.0059 | NaN | 0.0118 | |
High Velocity | rPearson | 0.0113 | -0.0004 | 0.2053 | 0.0007 | NaN | -0.0018 |
rSpearman | 0.2114 | 0.0121 | 0.2186 | 0.0989 | NaN | -0.0118 | |
Distress Reduction | rPearson | 0.0029 | 0.0005 | 0.0506 | 0.0000 | NaN | -0.0007 |
rSpearman | 0.0216 | 0.0006 | 0.0252 | 0.0264 | NaN | -0.0038 | |
Patient | rPearson | -0.0125 | -0.0001 | -0.2233 | -0.0005 | NaN | 0.0024 |
rSpearman | -0.1931 | -0.0079 | -0.2043 | -0.1022 | NaN | 0.0060 | |
Careful | rPearson | -0.0119 | -0.0005 | -0.2129 | -0.0005 | NaN | 0.0023 |
rSpearman | -0.1912 | -0.0115 | -0.2019 | -0.1078 | NaN | 0.0053 |
Table 2: Pearson and Spearman correlation table between driving style components from MDSI and sensor measurements for Scenario 11 - Slow Car Encounter
DRIVING STYLE | Coefficient | SENSOR MEASUREMENTS | |||||
---|---|---|---|---|---|---|---|
ΣMagnitude | ΣAcceleration | ΣSpeed | ΣSteering Angle | ΣLane | ΣJerk | ||
Dissociative | rPearson | 0.0845 | 0.0007 | 0.1063 | 0.0005 | 0.0084 | -0.0020 |
rSpearman | 0.0606 | 0.0160 | 0.0720 | 0.0001 | 0.0067 | 0.0009 | |
Anxious | rPearson | -0.0196 | -0.0003 | -0.0254 | 0.0028 | -0.0007 | 0.0004 |
rSpearman | 0.0217 | 0.0079 | 0.0169 | 0.0049 | -0.0003 | -0.0065 | |
Risky | rPearson | 0.0372 | -0.0003 | 0.0465 | -0.0025 | 0.0040 | -0.0010 |
rSpearman | 0.0084 | -0.0059 | 0.0131 | -0.0001 | 0.0036 | 0.0024 | |
Angry | rPearson | 0.0874 | 0.0013 | 0.1097 | 0.0040 | 0.0042 | -0.0019 |
rSpearman | 0.0442 | 0.0164 | 0.0512 | -0.0090 | 0.0095 | 0.0039 | |
High Velocity | rPearson | 0.1424 | 0.0005 | 0.1784 | -0.0057 | 0.0100 | -0.0034 |
rSpearman | 0.1805 | 0.0342 | 0.1994 | -0.0015 | 0.0100 | -0.0064 | |
Distress Reduction | rPearson | -0.1139 | -0.0006 | -0.1429 | -0.0074 | -0.0074 | 0.0026 |
rSpearman | -0.1445 | -0.0136 | -0.1581 | -0.0045 | -0.0073 | 0.0032 | |
Patient | rPearson | -0.1105 | -0.0007 | -0.1390 | 0.0115 | -0.0083 | 0.0026 |
rSpearman | -0.1401 | -0.0329 | -0.1562 | 0.0035 | -0.0085 | 0.0044 | |
Careful | rPearson | -0.0602 | -0.0002 | -0.0740 | -0.0027 | -0.0060 | 0.0015 |
rSpearman | -0.0343 | -0.0214 | -0.0406 | -0.0015 | -0.0057 | -0.0032 |
Demographics Data
The below figure shows the distribution of the driving styles based on the country where they learned driving and the distribution of the driving style based on gender.

Scenario Visualization
The below figure shows the speed-colored trajectories of four participants in Scenario-1: Sudden Lane Intercept, showing varied responses to the unexpected lane change. In the first plot, the driver seems to adroitly avoid a collision with the other vehicle and steer past it whereas the second plot shows how the driver did not anticipate the situation and had to go off-road to avoid collision. The third plot shows that the driver stopped when the other vehicle arrived and let it pass while the fourth plot shows that the ego vehicle accelerated past the other vehicle.







SUMO FCD Data Visualization
In the below figure specifically in the first plot between time and distance, it is evident that participants categorized as high-velocity drivers were able to cover the distance much faster than those in other categories. This group demonstrated a significantly higher speed, completing the journey in a shorter amount of time. Following them, the risky and angry drivers were the next fastest. These drivers often engage in rash behavior and make more overtakes, allowing them to cover the distance quickly, although with higher levels of risk. On the other hand, careful, distress-reduction and dissociative drivers reached their destinations more slowly. These drivers tend to be more cautious, easily vexed, or distracted, which results in slower travel speed. Their focus on safety or emotional state impacts their driving pace, causing them to take longer to complete the journey.Similarly, in the below figure the second plot between time and speed, high-velocity drivers maintained high speeds for longer durations, in contrast to patient drivers who were more careful about when to accelerate and when to drive slowly. Categories such as dissociative and risky drivers show more sporadic jumps in the plot, which is associated with their behavior, whereas careful drivers have a smooth curve along various speed ranges.














Random Forest Classification
The study focuses on processing driving behavior data and questionnaire responses to predict driving styles and evaluate model performance. The data comprises detailed driving metrics such as magnitude, acceleration, speed, angle difference, lane change, and jerk, extracted from individual participant files. Additionally, questionnaire responses are used as ground truth labels as they capture various driving styles, including dissociative, anxious, risky, angry, high velocity, distress reduction, patient, and careful. Using these datasets, a machine learning model was trained to predict driving styles. The training process involved learning from the preprocessed driving metrics and making predictions on a test set. Predictions were aggregated to identify the dominant driving style for each participant. The model’s performance was evaluated by comparing the predicted driving styles against the actual questionnaire responses, using a confusion matrix to assess accuracy and misclassification patterns.These confusion matrices are the result of the random forest model for Scenarios 1 - 6.






These confusion matrices are the result of the random forest model for Scenarios 7 - 12.





