The Big Picture
A driving model can learn long-term user habits from recorded driving and follow short-term natural-language instructions to match individual styles, while still keeping safety and reasonable performance in simulation.
ON THIS PAGE
Key Findings
A single end-to-end model that fuses front camera views, route goals, a driver profile embedding, and natural-language instructions can reproduce distinct driving styles. dynamic routing pattern Learning per-user embeddings from a curated dataset of real drivers lets the model shift its speed, lane-change, and following behavior to match individuals. ReAct Pattern (Reason + Act) Fine-tuning with reward signals for safety, comfort, and efficiency lets the model adapt to on-the-spot commands (like “I’m in a rush” or “Take it easy”) without collapsing safety in closed-loop simulation tests.
Key Data
1Collected a personalized driving dataset from 30 human drivers across 20 realistic driving scenarios in the CARLA simulator.
2Tested preference alignment on 25 in-distribution drivers and 5 out-of-distribution drivers; conducted a human similarity study with 10 evaluators rating model roll-outs against driver logs on a 1–10 scale.
3Model training and fine-tuning ran on eight GPUs; policy fine-tuning generated 4 candidate responses per input for gradient updates to adapt to instructions.
Why It Matters
Engineers building in-car personalization and product teams designing user-adjustable driving modes will care because this shows a practical way to combine long-term user history with simple spoken instructions. Researchers working on human-centered vehicle autonomy can use the public-style dataset and evaluation approach to compare personalization methods under closed-loop conditions. Supervisor Pattern
Test your agentsValidate against real scenarios
Key Figures

Fig 1: Figure 1 : Drive My Way (DMW) achieves end-to-end personalized driving via both long-term preference alignment and short-term style instruction adaptation.

Fig 2: Figure 2 : An overview of the Personal Driving Dataset, which consists of the driving data and structured driver profile data.

Fig 3: Figure 3 : An overview of the DMW framework with a pretrained VLA backbone. The model takes in front-view camera images, instructions, route target points, and user profile as inputs, while the motion predictor outputs route and speed waypoints, which derive the base action (throttle, steer angle). The residual decoder outputs a discrete residual applied to the base to produce the final personalized action.

Fig 4: Figure 4 : The contrastive learning mechanism on the long-term preference encoder and route processor.
Ready to evaluate your AI agents?
Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.
Learn MoreYes, But...
Results are demonstrated in the CARLA simulator, not on physical vehicles, so real-world transfer and sensor noise remain open questions. The dataset covers 30 drivers and 20 scenario types—useful but not exhaustive of global driving behaviors. The approach depends on privileged expert signals and simulated conditions for supervision and reward shaping, which may need redesign for on-road deployment and privacy considerations around storing user profiles. Hallucination
Deep Dive
The system learns a per-driver embedding from a new personalized driving dataset (PDD) collected in simulation: each participant provided a short profile and drove twenty standardized scenarios using a natural steering-wheel setup. Those embeddings capture long-term tendencies (speed preference, following distance, lane-change frequency). At runtime the model fuses front-view images, navigation waypoints, a short natural-language instruction (for example, “I’m running late” or “I want a smooth ride”), and the user embedding to predict waypoints and low-level controls. Supervision comes from expert trajectories and auxiliary reasoning signals; a subsequent fine-tuning stage uses reward signals that trade off safety, comfort, and efficiency with dynamically adjusted weights so the model adapts to short-term instructions. Evaluation-Driven Development (EDDOps) Evaluation uses closed-loop routes in a standard benchmark, terminating episodes on collision to emphasize safety. The model is compared to a baseline without user embeddings, a style-conditional baseline that maps instructions to a small set of pre-defined styles, and a multi-objective preference-conditioned policy. Performance is measured by driving score, success rate, efficiency, comfort, plus low-level metrics like average speed and lane changes. Results show clearer per-user behavioral distinctions and higher judged similarity to human logs in user studies, while maintaining competitive safety and route success in the simulator. The work highlights a practical path to personalized, language-guided driving in end-to-end models, with the caveat that sim-to-real validation and broader datasets are next steps.
Explore evaluation patternsSee how to apply these findings
Credibility Assessment:
Authors affiliated with KU Leuven (recognized institution) but individual h-indices are low and the paper is an arXiv preprint with minimal citations — solid but not top-tier by the rubric.