Back to Case Studies
publishedR&DOpenAIReinforcement LearningComputer VisionPythonPyTorch

AI Self-Driving Race Car

Beating OpenAI's reinforcement learning CarRacing environment (920+ score) using a deep reinforcement learning algorithm (DQN).

AI Self-Driving Race Car

This project takes us back to 2020, a time before LLMs took over the world and three years before ChatGPT became a household name. It was an era of intense academic research at Instituto Superior Técnico (IST) in Lisbon, where we were obsessed with bridging the gap between raw pixels and autonomous decision-making.

The Reinforcement Learning Era: Back then, the frontier of AI was all about Reinforcement Learning. It was the era of AlphaGo and OpenAI Gym, where the ultimate challenge was building self-learning systems that could master complex games and physics-based environments through trial and error. Long before the current LLM landscape, we were focused on the raw mechanics of machine learning agents that could learn to drive from scratch.

📺 Project Showcase

The Training Evolution

A timelapse of the DQN agent's learning process. Witnessing the transition from random exploration to deterministic, goal-oriented behavior over thousands of episodes.

Autonomous Racing Performance

The final agent navigating the track with high precision, maintaining a consistent score of 900+ by optimizing racing lines and velocity control.

🏗️ Technical Methodology

The core of the project was an Adapted Deep Q-Network (DQN) developed in collaboration with research insights from IST. We had to be extremely efficient; GPUs weren't as accessible as they are today.

  1. State Representation: We stacked 4 consecutive grayscale frames (96x96) to give the agent a "sense of motion," basically teaching it physics through temporal pixel delta.
  2. Reward Shaping: The breakthrough came from a custom reward function that penalized "hesitation" and rewarded aggressive yet stable racing lines.

Performance Metrics

The CarRacing-v0 environment is considered solved when the agent reaches an average score of 900 over 100 consecutive runs. Our optimized agent, developed at IST, consistently exceeded this benchmark during final testing.

Metric
Base DQN
Adapted DQN (Our Work)
Convergence Speed
Slow / Unstable
Fast / Stable
Peak Score
~600
920+
Final Test Avg
< 400
905.38 ± 23.82

🔬 Deep Technical Dive: The 2019/2020 Architecture

This wasn't just about training; it was about architectural efficiency. In an era where consumer GPUs like the GTX 1050 Ti were our main workhorses at Instituto Superior Técnico, we had to optimize every byte of the input space.

1. Advanced Input Data Preprocessing

Raw pixels (96x96x3) are noisy. We implemented a custom pipeline to sharpen the agent's focus:

  • Grayscale Conversion: Dropping color channels to focus on track contrast.
  • Grass Removal: Replacing green field pixels with white to make the gray track "pop" for the CNN.
  • Score Masking: Neutralizing the telemetry digits at the bottom with a black square to prevent the agent from over-fitting to the speed display.
  • Temporal Stacking: Stacking 4 consecutive frames to give the "static" CNN a sense of velocity and inertia.

2. Cognitive Simplification & Action Space

One of our biggest breakthroughs was reducing the action space. Instead of the default continuous space, we discretized it into 5 highly effective actions:

  1. Steer Left
  2. Steer Right
  3. Accelerate
  4. Brake
  5. Idling (Do Nothing)

This simplification allowed the DQN to converge significantly faster, as it didn't have to "learn" the infinite nuances of partial steering while still mastering the track.

3. The 3-Layer Optimized Network

The brain of our agent was a Dual-Convolutional architecture:

  • CNN Layer 1: 8 kernels (7x7) with stride 4, extracting coarse track features.
  • CNN Layer 2: 16 kernels (3x3) with stride 1, refining edge detection.
  • Fully Connected Head: A 400-neuron Dense layer with ReLU activation, mapping spatial features to the 5 output actions.
Instituto Superior Técnico
Published Research

Optimizing Agent Training with Deep Q-Learning on a Self-Driving Reinforcement Learning Environment

ieee

Publisher

IEEE

Venue

IWSSIP (Lisbon, Portugal)

🔗 Resources


Looking back, this project was the foundation of my engineering philosophy: build from first principles. This was 2020, a time when "agent" meant a DQN model running on a GTX 1050 Ti laptop GPU, and "OpenAI" was more of a technical playground than a retail giant.

Working at the Instituto Superior Técnico lab taught me that while models (and buzzwords) evolve, the core challenge of AI, aligning raw sensory input with deterministic utility, is timeless. We were ahead of the curve then, and that same "early adopter" mindset drives our work today.

Pedro

Founder & Principal Engineer