We present a new algorithm for residue multiplication modulo the Mersenne prime p = 2(521) - 1 based on the Toeplitz matrix-vector product. For this modulus, our algorithm yields better result in terms of the total number of operations than the previously known best algorithm of Granger and Scott presented in Public Key Cryptography (PKC) 2015. We have implemented three versions of our algorithm to provide an extensive comparison - according to the best of our knowledge with respect to the well-known algorithms and to show the robustness of our algorithm for this 521-bit Mersenne prime modulus. Each version is having less number of operations than its counterpart. On our machine, Intel Pentium CPU G2010 @ 2.80 GHz machine with gcc 5.3.1 compiler, we find that for each version of our algorithm modulus p is more efficient than modulus 2p. Hence, by using Granger and Scott code, constant-time variable-base scalar multiplication, for modulus p we find 1, 251, 502 clock cycles for P-521 (NIST and SECG curve) and 1, 055, 105 cycles for E-521 (Edwards curve). While, on the same machine the clock cycles counts of Granger-Scott code (modulus 2p) for P-521 and E-521 are 1, 332, 165 and 1, 148, 871 respectively.