Graphics processing units (GPUs) are specially designed for parallel applications and perform parallel operations much faster than central processing units (CPUs). In this work, we focus on the performance of the Advanced Encryption Standard (AES) on GPUs. We present optimizations which remove bank conflicts in shared memory accesses and provide 878.6 Gbps throughput for AES-128 encryption on an RTX 2070 Super, which is equivalent to 4.1 Gbps per Watt. Our optimizations provide more than 2.56x speed-up against the best GPU results in the literature. Our optimized AES implementations on GPUs even outperform any CPU using the hardware level AES New Instructions (AES-NI) and legacy FPGA-based cluster architectures like COPACOBANA and RIVYERA. Even on a low-end GPU like MX 250, we obtained 60.0 Gbps throughput for AES-256 which is generally faster than the read/write speeds of solid disks. Thus, transition from AES-128 to AES-256 when using GPUs would provide military grade security with no visible performance loss. With these breakthrough performances, GPUs can be used as a cryptographic co-processor for file or full disk encryption to remove performance loss coming from CPU encryption. With a single GPU as a co-processor, busy SSL servers can be free from the burden of encryption and use their whole CPU power for other operations. Moreover, these optimizations can help GPUs to practically verify theoretically obtained cryptanalysis results or their reduced versions in reasonable time.