Wednesday, May 14, 2008

Why not Embedded C++?

How embedded C++ differ from ISO C++?

Embedded C++(EC++) language is a subset of ISO C++ and a program written in EC++ can be compiled with any ISO C++ compiler. EC++ is formed by omitting following features of ISO C++.

1. Multiple Inheritance
2. Virtual base classes
3. Run Time Type Information
4. New style casts
5. The mutable type qualifier
6. Namespaces
7. Exceptions
8. Templates

Now let’s have a look on the facts of omitting these usages.

1. Multiple Inheritance

The reason to avoid multiple inheritance is quite simple. It is complex and not easy for even an expert programmer to design a class using multiple inheritance. But there are many superb interface hierarchies which we can form using multiple inheritance. The right way to avoid complexity is by setting rules instead of taking a feature out. By allowing and disallowing different hierarchies through a guideline is perfectly enough to avoid the confusions due to multiple inheritance.

2. Virtual base classes

If there is no multiple inheritance why would we need a virtual base class? Or else to avoid virtual base classes we have to omit multiple inheritance J.

3. Run Time Type Information

Run Time Type information can cause program size overhead because type information of the polymorphic classes is needed. It will only be advantageous for a program which is heavily polymorphic and it adds no merit to a program which is not much polymorphic. But every compiler including gcc has a compiler flag to either enable or disable RTTI. Any novice programmer can set it to off and now what is the need for such a omitting instead of a guide line not to use RTTI?

4. New style casts

Since EC++ omits RTTI the dynamic cast wont work. Even though EC++ don’t support new style casts static_cast will be valid. These alerting are either useless or artifact of disabling the RTTI.

5. The mutable type qualifier

It could rather be a guideline not to use mutable if the object is made as const than taking out such a feature. One of the best examples to understand the advantage of guidelines is to see there are hardly any goto statements in the programs even made by novices. The omitting of mutable also adds no advantage to a language for embedded system.

6. Namespaces

This is one of the weirdest omitting from the C++ language. The reason to omit namespaces is pretty simple because it is “EC++ committee thinks it is not essential to have namespaces”.

7. Exceptions

There are 2 major reasons to avoid exception handling in EC++.

1. It is difficult to estimate the time between when an exception has occurred and control has passed to a corresponding exception handler.
2. It is difficult to estimate memory consumption for exception handling.

Even in a real time environment what is the big deal in time between exception has occurred and caught? Even though there is unpredictability in throughput of exception handling it is always better not to crash your program or writing your own exception handling mechanisms. And still if exception handling need not be done in a program or at an environment it must be a guideline rather than freezing a feature of language.

8. Templates

The disadvantages of template are that it can introduce code bloat and the increase in program size. But templates make the development and code maintainability far better. Even template specialization or limiting to necessary types can be done for every class to avoid code bloat. To avoid careless usage of templates the proper method is never to take it out of the way.


It is always better to make guidelines rather than making a new language with subset of an existing one. To see the Stroustrup’s comments on embedded c++ can be found at Stroustrup's FAQ.

Wednesday, April 23, 2008

Virtual destructors (What? When? Why?)

There are some questions regarding virtual destructor that I am asked very frequently. Answers to all these questions are similar so I am making a common note on it.

• What is the use of virtual destructor?
• When to make base class destructor virtual?
• Is there any overhead for virtual destructor?
• Why don’t we make all the destructor virtual?
• Will the delete operator be overridden?
• Will the delete[] operator be overridden?

What is the use of virtual destructor?

The virtual destructor comes into play when the class is a base class. If the derived class object is deleted using a base class pointer the necessary destruction of the derived class would not happen if only base class destructor is called. To simplify, if the base class destructor is not virtual and if we try to delete the derived class object using base class pointer the derived class destructor will not be called. It’s known that the destruction is done from derived to base; hence if the derived class destructor is called it will call the base class destructor and the proper destruction of the object will be happening. Altogether by making the base class destructor virtual we are making destructor to be capable of overriding.

When to make base class destructor virtual?

Make the destructor of the class virtual if the class has any virtual function. It should probably save because of 2 reasons.

1. Most of the real world base classes will have at least one virtual function.
2. In the real world problems if there is no virtual function in the base class there is no specific advantage in using base class pointer.

Is there any overhead for virtual destructor?

The per-object-space-cost will be nothing because we only make our destructor private when there is at least one virtual method in the base class. When the first virtual method in the class is made; the per-object-space-cost will be paid off and this makes it costless to make the destructor virtual.

Why don’t we make all the destructor virtual?

If there is no virtual function in the base class and you make the destructor virtual it increases the per-object-space-cost for no added advantage. It is safe not to make the destructor if you class,

1. is not a base class
2. or is not having any virtual function

Will the delete operator be overridden?

Yea, the standards state the delete operator will be overridden and the derived class delete operator will be used for the destruction of derived class.

Will the delete[] operator be overridden?

NO, the standards explicitly states the delete[](deleting array of objects) operator will not be overridden and the operator corresponding to the type of pointer will be used for deleting.

For better understanding on delete and delete[] operator with virtual destruction please read the post Virtual Destructor delete and delete[]

Friday, April 11, 2008

The development of NULL - A history


Null pointer is a stuff which have had changes and advancement through out the development of C and C++. All the implementation had its own draw backs and now a perfect design is about to arrive in C++0x. Let us analyze the NULL pointer in C, C++ and C++0x.

K&R style(C Style)

A NULL pointer is a constant expression which evaluates to either 0 or ((void*) 0). There are machines which use different internal representations for pointers to different type. In such cases by standard it is always guaranteed a 0 cast to void* will be assignable to every pointer. For example if you assign a ((void*) 0) to a pointer of FILE type it is guaranteed to be initialized to a null pointer without any error.

C++ style(C++98)

A null pointer constant is an integral constant expression rvalue of an integer type that evaluates to zero. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of a pointer. So the macro NULL is equivalent to an integer 0 and therefore it is better to avoid NULL macro by directly using a 0.

C++0x style

In all the standards till now the constant 0 had double roles of constant integer 0 and null pointer. This behavior existed throughout the development of C and C++. In C the NULL is a macro which assigns to either 0 or ((void*) 0). But in C++ NULL is always a special case represented as 0. But even using a 0 have its own drawback while overloading. For example we have two declarations,

void foo( char* p );
void foo( int p );

and then call foo(NULL). This will call the integer overload of foo where a programmer may normally intend to call char overload with a null pointer.

To get rid of this issue the new standards will include an additional keyword ‘nullptr’ which would only be assignable to pointer types and comparable to pointer types. The existing 0 will have to suffer the double role again to have backward compatibility. But sooner or later C++ committee would declare deprecated usage of 0 and NULL as null pointers, and eventually avoid this double role.

Thursday, February 21, 2008

A volatile reference and const reference

A volatile reference

void foo(volatile double& bar )
cout << bar << endl;

Above function accepts a reference to double. What happens if you call,

int nVal = 0;
Foo( nVal );

An error will be generated by the compiler specifying the conversion from ‘ int ’ to ‘ volatile double & ’ is not possible.

Why it shouldn’t cast?

A casting has to be don when an integer need to be passed to a method which takes double as input. When we do a standard conversion from int to double a temporary object will be created with the help of implicit conversion. A function which takes volatile reference parameter can change the value of the parameter. Now let us map these two things together. When we do the implicit conversion a temporary object is created and if we pass that temporary object to a method which accepts volatile reference what happens? The method may modify the temporary object which will not affect the original one. For example,

void foo(volatile double& bar )
bar = 0;

int nVal = 10;
cout << nVal;

What we expect at the output? Here we intended to set the value of nVal as 0. But what happens if the above code runs? An implicit conversion has to be done from into to double and the resultant temporary is passed to method foo. So function foo sets the value of temporary object instead of nVal. Now it is great to see why the function with reference to double doesn’t compile while called with int.

A constant reference

void foo(const double& bar )
cout << bar << endl;

Now what should happen? Function foo accepts a const reference to double which in terms guarantees no update to the parameter bar. In this case it is safe to pass a temporary object. So the compiler will allow the following call.

int nVal = 0;
Foo( nVal );


If a temporary object could be used for calling a volatile reference function there might have been many hard to find bugs.

Tuesday, February 12, 2008

Compromising quality for performance

It has always been a big deal to compromise accuracy for performance. In most of the cases highly complex and time taking application will need highest accuracy. The accuracy has always been the problem in using GPU for performance improvement of such algorithms. Since the GPUs don’t support double precession arithmetic it looks hard to achieve high precision with it. CPU does floating point division using double precision arithmetic. But even the latest GPU from nVIDIA(8800) uses reciprocal multiplication with single precession for division.

There will be situations when you have to deal with very convoluted shapes. In such cases it becomes hard to settle down for floating point accuracy. In other way if CPU is used it might be impractical to get the algorithm working at real-time. In such cases the tough question comes. Can accuracy be compromised for performance?

If the answer is Yes!

If it is not so important to get the highest accuracy we can of course go for GPU. The massive computation power can be used to get the algorithm executed in real-time. It becomes an easy way of optimizing your algorithm by allowing a % of tolerance to the output. In this case you must be sure that the tolerance comes into a range which makes the algorithm usable.

If it is a Big No!

Here comes the problem. You have an algorithm which is not executable in real-time because of less computation power you have with available resource. You must settle for the single precision arithmetic with GPU. Now what can we do to improve performance with highest accuracy. An idea is to use both CPU and GPU for the execution. At first run all the parallel code using GPU and calculate the output. Then calculate the tolerance using a CPU version of the code which gives highest accuracy. Now do some very less amount of iterations of your algorithm using CPU to find the best value.

A case study

Suppose you have to do registration between two 3D surfaces. At first calculate the parameters needed for registering both surface using GPU. It may take a lot of iteration to find the rotation, translation, scaling and shearing parameters. When the correct registration parameters are found using GPU do some iteration with the CPU to find the best convergence. So now you can achieve performance improvement by doing more number of iterations in GPU. The accuracy is also good since we done a CPU based calculation at the end with the help of approximate parameters calculated using GPU. This strategy can be used in most of the cases where highest amount of accuracy is needed.

Monday, February 11, 2008

Did Herb Sutter fight with the Amdahl’s law?

Understanding Amdahl’s law

No matter how much speedup you get for the parallel code, it is impossible to make a speedup of 2x if 75% of the algorithm cannot be parallelized. Suppose if you do the other 25% with 0 seconds that makes you gain up to 1.3x. For quantifying, if your algorithm takes 1 second and if 0.75 second of it is non-improvable, it doesn’t matter how much improvement you make for other 0.25 seconds, you cant get the total time scaled down to half.

Understanding Gustafson’s law

Rather than speedup Gustafson takes work into consideration. It states if your algorithm takes 1 second to complete and you have 0.75 second non-improvable part you can still get 2x speedup if you add a new feature which takes another 1 second in which 0.75 second is improvable. It means in the total 2 seconds, 1 second (0.75+0.25) of the code is non improvable and another half is improvable. Suppose you get infinite improvement which in terms reduces the time taken from 0.25+0.75 second to 0 second, total time of the algorithm goes down to 1 second. It means you got 2x speeds by adding more work.

Again did Herb fought the Amdahl’s law and won?

Negative. Herb didn’t even try to fight with the Amdahl’s law. Herb only proves it is better to take Gustafson’s law into consideration while deciding on going for parallelization or not. It is Amdahl’s law which must be taken in to consideration on calculating the amount of speedup that can be achieved by increasing number of cores. But the importance is you should not think negative due to the results of Amdahl’s law calculation. It is much more practical to understand your application will have more features added in future and those may be drastically improvable by executing tasks in parallel. It also means number of cores must be taken in to consideration rather then fixed size problems.


Break Amdahl’s law – It is a correct attitude. You must not get into a deadlock by finding the algorithm is not much improvable because of Amdahl’s law results.
“Herb fought the law—Amdahl's Law, that is—and Herb won-DDJ”- It is not a good title. If someone has ever tried to break the Amdahl’s law it is Gustafson. And even Gustafson named his paper “Reevaluating Amdahl's Law”

Friday, January 18, 2008

Limiting the possible data types given for a template class during instantiation

How to limit the data types that can be used for instantiating a template class?

In C++ there is no standard mechanism to limit the data types that a template can be created with. For example,

template < class T>
class ATemplate

The above class (ATemplate) can be instantiated with any data type. So ‘T’ can be any data type (eg. basic data types, structures and classes). But what if you want to accept only selected data types? For example a template which only supports char, int and double. C++ doesn’t have a standard method to do it. But Microsoft have put some effort in .NET Generics (something like a C++ template) to limit the data types which can be supported. Even though C++ doesn’t support it directly it is possible to get the data types limited in C++ by making a small trick. Let us see an example.

template < class T >
class MyTemplate
T m_x;
void AllowThisType( int& obj ){}
void AllowThisType( char& obj ){}
void AllowThisType( double& obj ){}

T tmp;
AllowThisType( tmp );
void SetX( T val )
m_x = val;

void print()
std::cout<< m_x<< std::endl;

int main()
MyTemplate < int > objint;
MyTemplate < char > objchar;
MyTemplate < double > objdouble;
MyTemplate < float > objfloat; //<< Error when you create this

How do the above program limit the data types that can be used for creating a template class object?
From the constructor of class MyTemplate an overloaded function AllowThisType is called. This function is having 3 different overloads. An integer reference, character reference and a double reference. So when you make object with int, char or double there is an appropriate AllowThisType method that compiler can find and match. But when you create an object with any other data type compiler cannot find a matching AllowThisType overload and hence the compilation fails. Let us see an example.

MyTemplate < float > objfloat;

When you make an object of MyTemplate with T as float compiler will look for an AllowThisType( float& ). Since we did not write an overload like AllowThisType( float& ) the compilation fails. So it is clear that the data types which can be given are limited here. Whenever an addiditional type needs to be supported a new AllowThisType overload must be added with the newly supporting data type.

Monday, January 14, 2008

Teraflop processors is not far, Are we ready to use it?

The prototype 80-core Polaris processor on a single chip delivered the super computer like performance of a trillion floating-point operations per second (one teraflop) while consuming less than 62 watts – Intel.

It will take less than 10 years from now for a common man to have PC running on a teraflop processor. A prototype of teraflop processor has 80 cores which can be executed in parallel. So to get the most out of 80 cores we need to run 80 threads in parallel. Hence it’s clear that the program must be heavily threaded to make use of all cores.

The question that comes is, “It can deliver up to a Teraflop, but how we are going to get most out of it?”

These are the days when programmers are trying hard to get most out of a quad core or a dual core processor. These processors can give a lot but it is up to the programmer to make use of it.

Threading for parallelism

Most of the programmers used thread only for separating User Interface from the time consuming operations that happens according to the user operation. But those days are gone. Now the thread is not just to do things without blocking the other one. It is all about performance. Threading for performance is the key now.

The hard

Hands full of tools are available which helps to analyze, debug and optimize threads. It’s not hard to detect a synchronization problem or a thread over run. It’s easy these days to debug a chunk of code in different threads. But why all algorithms are not yet threaded? What is the big deal in it? It is discovering parallelism!!! Yea, the hardest ever thing in optimization is finding a parallel way to optimize the most time taking part of the algorithm. Mostly every time if we look the code of an algorithm the most time taking part will be entirely sequential. It will look like something which can never be parallelized. That is where it gets quite tricky. More and more innovation can only do something to get things parallelized. It’s not about parallelizing the code of the algorithm; it’s all about changing the algorithm in a parallel way!