AI Self-Driving Race Car

This project takes us back to 2020, before ChatGPT became a household name. It was an era of intense academic research at Instituto Superior Técnico (IST) in Lisbon, where we were obsessed with teaching computers to learn on their own.

The Challenge: Could we train an AI to drive a race car, starting from zero knowledge, by letting it learn from its own crashes and successes? Like teaching a toddler to ride a bike, the agent had to figure out steering, acceleration, and braking through pure trial and error.

📺 Project Showcase

The Training Evolution

A timelapse of the DQN agent's learning process. Witnessing the transition from random exploration to deterministic, goal-oriented behavior over thousands of episodes.

Autonomous Racing Performance

The final agent navigating the track with high precision, maintaining a consistent score of 900+ by optimizing racing lines and velocity control.

🏗️ Technical Methodology

The core of the project was an Adapted Deep Q-Network (DQN) developed in collaboration with research insights from IST. We had to be extremely efficient; GPUs weren't as accessible as they are today.

State Representation: We stacked 4 consecutive grayscale frames (96x96) to give the agent a "sense of motion," basically teaching it physics through temporal pixel delta.
Reward Shaping: The breakthrough came from a custom reward function that penalized "hesitation" and rewarded aggressive yet stable racing lines.

Performance Metrics

The CarRacing-v0 environment is considered solved when the agent reaches an average score of 900 over 100 consecutive runs. Our optimized agent, developed at IST, consistently exceeded this benchmark during final testing.

Metric

Base DQN

Adapted DQN (Our Work)

Convergence Speed

Slow / Unstable

Fast / Stable

Peak Score

~600

920+

Final Test Avg

< 400

905.38 ± 23.82

🔬 Deep Technical Dive: The 2019/2020 Architecture

This wasn't just about training; it was about architectural efficiency. In an era where consumer GPUs like the GTX 1050 Ti were our main workhorses at Instituto Superior Técnico, we had to optimize every byte of the input space.

1. Advanced Input Data Preprocessing

Raw pixels (96x96x3) are noisy. We implemented a custom pipeline to sharpen the agent's focus:

Grayscale Conversion: Dropping color channels to focus on track contrast.
Grass Removal: Replacing green field pixels with white to make the gray track "pop" for the CNN.
Score Masking: Neutralizing the telemetry digits at the bottom with a black square to prevent the agent from over-fitting to the speed display.
Temporal Stacking: Stacking 4 consecutive frames to give the "static" CNN a sense of velocity and inertia.

2. Cognitive Simplification & Action Space

One of our biggest breakthroughs was reducing the action space. Instead of the default continuous space, we discretized it into 5 highly effective actions:

Steer Left
Steer Right
Accelerate
Brake
Idling (Do Nothing)

This simplification allowed the DQN to converge significantly faster, as it didn't have to "learn" the infinite nuances of partial steering while still mastering the track.

3. The 3-Layer Optimized Network

The brain of our agent was a Dual-Convolutional architecture:

CNN Layer 1: 8 kernels (7x7) with stride 4, extracting coarse track features.
CNN Layer 2: 16 kernels (3x3) with stride 1, refining edge detection.
Fully Connected Head: A 400-neuron Dense layer with ReLU activation, mapping spatial features to the 5 output actions.

Published Research

Optimizing Agent Training with Deep Q-Learning on a Self-Driving Reinforcement Learning Environment

Publisher

IEEE

Venue

IWSSIP (Lisbon, Portugal)

🔗 Resources

GitHub Repository

View our original PyTorch logic

IEEE Publication

Read the peer-reviewed paper

Looking back, this project was the foundation of my engineering philosophy: build from first principles. This was 2020, a time when "agent" meant a DQN model running on a GTX 1050 Ti laptop GPU, and "OpenAI" was more of a technical playground than a retail giant.

Working at the Instituto Superior Técnico Intelligent Systems lab taught me that while models (and buzzwords) evolve, the core challenge of AI, aligning raw sensory input with deterministic utility, is timeless. We were ahead of the curve then, and that same "early adopter" mindset drives our work today.

— Pedro

Founder & Principal Engineer