AI Self-Driving Race Car
Beating OpenAI's reinforcement learning CarRacing environment (920+ score) using a deep reinforcement learning algorithm (DQN).
AI Self-Driving Race Car
This project takes us back to 2020, a time before LLMs took over the world and three years before ChatGPT became a household name. It was an era of intense academic research at Instituto Superior Técnico (IST) in Lisbon, where we were obsessed with bridging the gap between raw pixels and autonomous decision-making.
The Reinforcement Learning Era: Back then, the frontier of AI was all about Reinforcement Learning. It was the era of AlphaGo and OpenAI Gym, where the ultimate challenge was building self-learning systems that could master complex games and physics-based environments through trial and error. Long before the current LLM landscape, we were focused on the raw mechanics of machine learning agents that could learn to drive from scratch.
📺 Project Showcase
The Training Evolution
A timelapse of the DQN agent's learning process. Witnessing the transition from random exploration to deterministic, goal-oriented behavior over thousands of episodes.
Autonomous Racing Performance
The final agent navigating the track with high precision, maintaining a consistent score of 900+ by optimizing racing lines and velocity control.
🏗️ Technical Methodology
The core of the project was an Adapted Deep Q-Network (DQN) developed in collaboration with research insights from IST. We had to be extremely efficient; GPUs weren't as accessible as they are today.
- State Representation: We stacked 4 consecutive grayscale frames (96x96) to give the agent a "sense of motion," basically teaching it physics through temporal pixel delta.
- Reward Shaping: The breakthrough came from a custom reward function that penalized "hesitation" and rewarded aggressive yet stable racing lines.
Performance Metrics
The CarRacing-v0 environment is considered solved when the agent reaches an average score of 900 over 100 consecutive runs. Our optimized agent, developed at IST, consistently exceeded this benchmark during final testing.
🔬 Deep Technical Dive: The 2019/2020 Architecture
This wasn't just about training; it was about architectural efficiency. In an era where consumer GPUs like the GTX 1050 Ti were our main workhorses at Instituto Superior Técnico, we had to optimize every byte of the input space.
1. Advanced Input Data Preprocessing
Raw pixels (96x96x3) are noisy. We implemented a custom pipeline to sharpen the agent's focus:
- Grayscale Conversion: Dropping color channels to focus on track contrast.
- Grass Removal: Replacing green field pixels with white to make the gray track "pop" for the CNN.
- Score Masking: Neutralizing the telemetry digits at the bottom with a black square to prevent the agent from over-fitting to the speed display.
- Temporal Stacking: Stacking 4 consecutive frames to give the "static" CNN a sense of velocity and inertia.
2. Cognitive Simplification & Action Space
One of our biggest breakthroughs was reducing the action space. Instead of the default continuous space, we discretized it into 5 highly effective actions:
- Steer Left
- Steer Right
- Accelerate
- Brake
- Idling (Do Nothing)
This simplification allowed the DQN to converge significantly faster, as it didn't have to "learn" the infinite nuances of partial steering while still mastering the track.
3. The 3-Layer Optimized Network
The brain of our agent was a Dual-Convolutional architecture:
- CNN Layer 1: 8 kernels (7x7) with stride 4, extracting coarse track features.
- CNN Layer 2: 16 kernels (3x3) with stride 1, refining edge detection.
- Fully Connected Head: A 400-neuron Dense layer with ReLU activation, mapping spatial features to the 5 output actions.
Optimizing Agent Training with Deep Q-Learning on a Self-Driving Reinforcement Learning Environment
Publisher
IEEE
Venue
IWSSIP (Lisbon, Portugal)
🔗 Resources
Looking back, this project was the foundation of my engineering philosophy: build from first principles. This was 2020, a time when "agent" meant a DQN model running on a GTX 1050 Ti laptop GPU, and "OpenAI" was more of a technical playground than a retail giant.
Working at the Instituto Superior Técnico lab taught me that while models (and buzzwords) evolve, the core challenge of AI, aligning raw sensory input with deterministic utility, is timeless. We were ahead of the curve then, and that same "early adopter" mindset drives our work today.
— Pedro
Founder & Principal Engineer