Language Benchmark

Version 8 Jan. 2004.

This is a comparison of the performance of different programming languages. The benchmark problem was to calculate the sum of all entries of a the square of a n x n matrix M where M(i,j)=i+j+0.5. The implementation in R is the shortest:

matrixmult <- function(n) {
  M <- array(0.,dim=c(n,n));
  for (i in 1:n) {
    for (j in 1:n) {
      M[i,j] <- i+j+0.5;
    }
  }
  r <- 0;
  for (i in 1:n) {
    for (j in 1:n) {
      for (k in 1:n) {
        r <- r+M[i,k]*M[k,j];
      }
    }
  }
  r;
}

The operator <- means assignment.

The matrix multiplication part of the algorithm needs n3 "operations" (i.e. additions and multiplications) and is the same in all languages, except perhaps in the fast versions of the R and Maple implementation where the built-in matrix multiplication is used which may be implemented by a faster algorithm than O(n3).

The table shows the processor cycles needed per "operation", i.e. CPU frequency · total time / n3, for different matrix sizes n.
Language Version Variant Bytes 32 64 128 256 512 1024 2048
C gcc 2.95.2 fast 460 19 18 18 17
C gcc 3.0.2 fast 460 16 18 17 17
C icc 5.0.1 fast 460 21 18 17 17
C++ g++ 2.95.2 fast 599 18 18 17 17
C++ g++ 3.0.2 fast 599 29 27 28 27
C++ icpc 5.0.1 fast 599 26 27 26 26
C gcc 2.95.2 407 82 94 111 189
C gcc 3.0.2 407 76 204 262 239
C icc 5.0.1 407 73 162 217 271
C++ g++ 2.95.2 500 21 34 45 89
C++ g++ 3.0.2 500 38 58 87 142
C++ icpc 5.0.1 500 40 62 76 101
Fortran g77 0.5.25 599 85 225 226 243
Java jdk 1.1.8 417 194 213 240 259
R 1.1.1 fast 120 2051 433 210 195
Haskell ghc 4.08.1 870 501 496 625 739
Perl 5.6.0 287 2370 2376 2424
Python 2.0 279 4740 3828 3800 4006
Haskell hugs February 2000 870 8750 5150 4831 6045
Maple 4 fast 212 7656 9343 17785
Maple 4 223 11302 14128 19386
R 1.1.1 187 49220 38692 37285
BASIC bwBASIC 2.20 192 80574 78022 75966
Bash 2.04.0 348 111564 169352 1539263

The "fast" variants are meant to be very fast implementations of the matrix multiplication algorithm in the respective programming language. The fast implementations in C and C++ assure that memory is addressed consecutively. The fast implementations in R and Maple use the built-in matrix multiplication of these languages. For small matrix size n the naive C and C++ implementations perform already worse than the fast ones, but for large problems these naive implementations fall almost back to Java performance. The difference between C compilers (GNU gcc and Intel C++ Compiler Version 5.0.1 Build 010730D0) for the fast versions is neglectable, but g++ 3.0.2 and Intel C++ seem to have problems with the fast C++ version. Java is the fastest interpreter, even three times faster than the Glasgow Haskell Compiler ghc (it compiles to C, then uses gcc as a backend). All other interpreters seem to have performance problems, more or less serious. Haskell has the potential to optimize the program in a way that no memory access at all is needed and therefore to outperform all other languages but the ghc version seems to use even more memory than needed for n2 doubles. The Bash implementation shows that Bash has a serious performance problem with large array sizes because it performs worse than O(n3).

Code sizes are given in characters without leading spaces. The implementation in Haskell is the longest, the one in R the shortest (that doesn't use the high-level built-in feature of matrix multiplication).

The benchmarks were performed on a dual Athlon MP 1.2 GHz machine with DDR-RAM (FSB 266 MHz).

Full source (17 KB K) of the benchmark.

If you know other programming languages in that you can provide the source for the algorithm above, I'd be glad to hear from you ( mail@Franosch.org ) so that I can include these languages to the comparison.

Martin Nilsson has contributed a Pike version of the benchmark.
WML Viewable With Any Browser Linux Valid HTML 4.0! Valid CSS!

Last modified 2008-04-17 by webmaster@Franosch.org .