Monthly Archives: July 2011

Summing floats, more than one way to do it

So you summed some floats and have some unexpected results? Don’t be afraid, it’s normal.

When summing floats you have errors, not only because some numbers can’t be represented exactly as a float, but you get errors from differences in magnitude too.

Say you have a large float and add a small one to it, the small one will loose a lot of it’s precision in the process. You can see what can happen if you use fixed-width decimal numbers:

12345.6 +
    0.12345
------------
12345.72345

If you only have one addition, the error could be acceptable, but if you have to sum a lot of numbers the small ones could start to be a significant part of the sum, and that part can get lost in the process.

If you had:

12345.6 +
    0.05123
------------
12345.65123 +
    0.05123
------------
12345.65123

You just lost .1, even if you could have avoided that by reordering the operations:

    0.05123 +
    0.05123 
    ---------
    0.10246 +
12345.6
-------------
12345.70246

So what can you do?

Actually, you have multiple choices, depending on what you want to achieve:

  • You can sort the numbers (small to big) and sum them after sorting. This will group together numbers of similar magnitude and it will yield smaller errors. But this is not the best method, there are more accurate methods that will not need a sort.
  • You can use Kahan summation. This is more accurate than sorting and it should be faster too.
  • You can use use doubles. This is the best method, but what if you’re summing a vector of doubles in the first place?
  • If you’re feeling adventurous you can run the sum on the FPU, but not many places let you insert assembly in the codebase.

If you want to see some code, you can download this file: main.zip. That code works on GCC 4.5 with the -std=c++0x flag.

Here is the output of a run, summing 10M random floats:

Starting: /home/daniel/projects/floats/build/floats
2916.660400390625 <- Reverse sorted
3057.517578125 <- Normal sum
3192.88330078125 <- Sorted
3194.340576171875 <- Kahan summation
3194.340576171875 <- FPU
3194.340576171875 <- Double truncated
3194.34058601572951374691911041736602783203125 <- Double precision

How to check if a type is a function

The first time I saw variadic templates, I knew they are useful (as useful as lambdas in my opinion).

Today, we’ll take a look at how to check if some type is a function or not.

Keep in mind that this is compile time code.

The code goes something like this:

This code is using template specialization on function types.

The variadic Args are there just so that we can catch any function, with any number of parameters.

If you want to use the value at runtime, there are a couple of ways to do it, one of them using the above code and decltype, and the one I’ll show you below, using the same method but function templates instead of structs.

That’s just the same thing, but using overloading instead of specialization.

You can use them like this:

where test_fun is defined like: