Monday, December 10, 2007

What if someone come with a compiler capable of parallelizing the algorithms?

In many training sessions I have heard a common question from most of the business men. If I give training to all my employees on parallel computing can you guarantee me that no one will come up with an automatically parallelizing compiler tomorrow? Won't all the money I spent on the training get wasted?

Is it a big deal?

This has been a very serious question for all these time. May be most of the companies are still thinking something will happen and the performance will get improved automatically. But it is quite easy to prove that nothing is going to workout on automatic parallelizing of an algorithm.

How easy it is to prove?

May be someone can come up with a compiler that does some amount of parallelism. And even they already exist!!! But definitely it cannot parallelize the logic of your algorithm, isn’t it? It’s for sure that the core logic cannot be parallelized without a skilled programmer and an algorithm expert. A compiler can do some amount of logic less parallelism which is not going to increase the performance drastically. For a drastic improvement almost all of the algorithm implementations need to be rewritten in a parallel way. During those days when the algorithms were implemented there was no parallel computing on a personal computer. Only super computers had parallel processors. Now the time has changed. Almost all the desktop users in the world can afford to have a multi core machine in front. Even most have a high end graphics card inside as well. So it is for sure if you get skilled in parallel programming it is not going to be a waste. No compiler can be intelligent enough like you to do the parallelism. Only skilled human can do the parallel programming. Get yourself trained or leave your job to others!!!

Wednesday, December 5, 2007

Think parallel, save your product

I always wonder why we all speak about performance. We uses many jargons like performance, optimization, throughput, scalability, parallel thinking, hybrid computing, threads, locks, semaphores, synchronization, penalty, data parallelism, instruction parallelism and many more.

Why we are aiming on performance? What happened to the software industry in a very short span? Instead of just making new and new software why everyone thinks about increasing the performance of the existing ones?

What is the motivation?

Motivation number 1... The customer satisfaction...

I never want to know how complex the algorithm is, I want to get it done in slice of a second...
I paid $10000 for your product and now there are 100 other products which takes a second to do what you gets done in a day...You cheated me?
My system is having a quad core processor and you takes 100 seconds to get things done. When I look the CPU usage it is just 25%, what are you doing inside your program? Why did i buy a quad core machine spending all my money?

Motivation number 2... A stitch in time saves nine...

What if your software takes an hour to detect the spread blood inside brain? The patient may die before you detect the problem.
What if it takes an hour to diagnose your vehicles electrical problem? Don't you have tight schedule?

Motivation number 3... There is no point in watering a dead plant...
What is the use if you detect a tsunami after it has hit the shores?
What is the use if you predict tomorrows weather 2 days later?
What are you going to achieve if you detect a fire in a chamber and the alarm gets triggered 10 seconds late?

I conclude it like this.
So it is sure that time matters. If your competitor do a thing much faster than you what happens? The answer is clear. You can’t compete anymore.

Parallel Extension to .NET Framework

Something to hope for .NET programmers!!!

Microsoft has come with a parallel extension to .NET framework (managed code). This may be a revolution in making high performance programs using .NET. But I wonder how many HPC applications can be done using .NET because of its lack of speed compared to a C/C++ program. I have written another article about the performance difference between C++ and c# at http://amalp.blogspot.com/2007/10/performance-analysis-c-vs-c.html. To make use of all the cores in a Multicore environment we definitely need threading. So using of parallel extensions can make the program run quite faster on a system with more than one core. Even if we use the parallel extension it will be little possible for a managed code to run as faster as a unmanaged code. So for performance either C or C++ is the best. For more information about parallel extension visit the blog http://blogs.msdn.com/somasegar/archive/2007/11/29/parallel-extensions-to-the-net-fx-ctp.aspx

Still C# zags in performance?

C# increases productivity by compromising performance. But if someone else makes the same application using C++ which takes half the time; what is the use of that productivity? May be you can start selling earlier and stop selling earlier.

Monday, December 3, 2007

High performance computing

Why?
As human being greedy we will never settle down with what we have got. We will always look for more. That is exactly happening in high performance computing industry. Long back we had very slow processors which took seconds to sort a small chunk of data. Now we want to predict the climate of each and every location of world within seconds. There are lots of medical imaging algorithms waiting in the shore for more computation power. There are lots of generic algorithms which would solve many problems in the real world which needs a bulk more computational power. So that is why the high performance computing industry is quite hot.


The business

The first name that comes when looking into high performance computing is Intel. They are coming up with processors with more and more cores. Then there is AMD in the form of CPU and GPU(ATI is now AMD's). But the leader in GPU is still NVIDIA with their latest GPU 8800 ultra which have 128 SIMD cores. There is a brand new architecture from IBM called IBM Cell, Intel is going to release larabee next year.

Looking into software Intel have their own compiler which compiles High level code to machine code with highest optimization for CPU, NVIDIA have CUDA for doing the general purpose programs in GPU, There is directx & HLSL from microsoft for GPU, There is Cg and CgFx from NVIDIA, There is OpenGL from ARB, and RapidMind have their own stream programming libraries. And we can’t hear a brand peak stream now because google bought it.


The Free Lunch Is Over, A Fundamental Turn Toward Concurrency in Software - Herb Slutter

Till today the programmers didn’t need to think much about performance optimization. The hardware vendors were keeping on improving their hardware which needs no change in software to improve the performance. But that have reached its limit. The clock speed can’t be increased anymore; the power can’t be increased due to heat dissipation; the physics is catching up. The free lunch is over. Now it is multicore. It is parallel thinking which can improve the performance.

Think parallel or perish - Intel

Intel say either think parallel or get perished. If you still keep on writing a single threaded serial code your software will be outdated. Do you think someone going to buy your software when one other can do the same thing in one tenth of your time? It is only parallelism which can improve he performance now.

The GPGPU

What is this GP in GPU? Looking wierd? But it is reality. You can do a lot of multi threaded application using GPU which does general purpose tasks instead of usual graphics tasks. CUDA from NVIDIA is the best way to make GPGPU program.

Conclusion

If your algorithm is not parallel; If your application is still running on single thread; If you still keep on thinking someone else will speedup your algorithm.; You will be perished.. Your algorithm will not have existence. Better late than never!!!

Saturday, December 1, 2007

CUDA - Compute Unified Device Architecture

Compute Unified Device Architecture is an easy way to use GPU for General Purpose Programming. No graphics knowledge is required to use CUDA for doing a program using GPU. A CUDA program is almost same as a C program but have some additional features. In CUDA a function can be run in many threads by giving a execution configuration while calling a function. There are 3 kinds of functions in CUDA. A device function which can be only executed in the device and called from device, a global function which can be called from host(CPU) using some configuration and gets executed in device, and a pure host function which must be executed in the CPU only.

There are some additional specifiers to distinguish the function type.

__device__ - if a function is suffixed with __device__ it becomes a device function which can only be executed at device and which can only be called from a function that executes on device.
__host__ - These functions are the normal C functions that can be executed on the host(CPU)
__global__ - A function suffixed with __global__ can be called from CPU. But for calling this function the execution configuration must be mentioned. The execution configuration decides how many threads and blocks have to be made for executing this function.

Also there are different kinds of memory,

__shared__ - if this is prefixed that memory becomes a shared memory and it can be shared across threads. This is the fastest memory.
__constant__ - a memory to which we can write from host only.
__device__ - a memory to which we can write from both device and host.


Each thread will have a thread ID and block ID to know which area of the data need to be processed by this thread. This is the tricky area where all the performance improvement lies.

Thursday, November 1, 2007

Singleton design pattern

What is a singleton?

It means nothing much than its word meaning. In c++ point of view Singleton means a class that can only have one object. So,

A singleton class is a class that will have only one instance at any time.

When to go for singleton?

You must use a singleton pattern if your class must only have one instance. For example,

* If your class is doing some loging to a console you may need to log all info to same console. Then go for singleton. Allow only one instance of the class to be created.
* If your class uses some device which must only be created/initialized ones and multiple object should not access the same go for singleton.
* If from more than one place you want to use the same instance of a class we can definitely think about using singleton.

How to make a class singleton?

There are different ways for making a singleton class. But there are some common thing to do while making a Singleton class. They are,

* Make the constructor of the class private
* Make the copy constructor of the class private
* Overload the = operator as private
* Write a static member function which returns the static object created inside class/function.



Implementation

#include
using namespace std;

class SingleTon
{
static SingleTon m_OneAndOnlyObject;
int m_nValue;
private:
// Constructor is declared as private
SingleTon( int nVal ):m_nValue( nVal ){;}
// Copy constructor is declared as private
SingleTon( const SingleTon& );
// = operator is declared as private
void operator = ( SingleTon& );
public:
static SingleTon& getInstance()
{
// The same instance will be always returned
return m_OneAndOnlyObject;
}
void SetValue( int nVal ){ m_nValue = nVal; }
int GetValue( void ){ return m_nValue; }
};

// Initializing the static object
SingleTon SingleTon::m_OneAndOnlyObject( 10 );

int main(int argc, char* argv[])
{
// Getting a reference
SingleTon& Obj1 = SingleTon::getInstance();
// Printing the value using Obj1-
// and setting the value using Obj1
cout << "Obj1.m_nValue: " << Obj1.GetValue() << endl;
Obj1.SetValue( 20 );

// Making second reference and printing the value
SingleTon& Obj2 = SingleTon::getInstance();
cout << "Obj2.m_nValue: " << Obj2.GetValue() << endl;

// Changing the value of m_nValue using Obj2-
// and Printing using Obj1
Obj2.SetValue( 30 );
cout<<"Obj1.m_nValue: "< return 0;
}

In the above program we have created 2 objects. If we set the value of m_nValue using any of the reference it will reflect in all others because there is only one object.

Variants:

There are different methods for making a singleton class

1. By creating static object inside the getInstance function.

Code:

static SingleTon& getInstance()
{
// The same instance will be always returned
static SingleTon OneAndOnlyObject(10);
return OneAndOnlyObject;
}

2. Creating in heap

Code:
static SingleTon* m_OneAndOnlyObjectHeap; // Change the static object to a pointer

static SingleTon* getInstance()
{
// The same instance will be always returned
if( !m_OneAndOnlyObjectHeap )
{
m_OneAndOnlyObjectHeap = new SingleTon( 10 );
}
return m_OneAndOnlyObjectHeap;
}
SingleTon* SingleTon::m_OneAndOnlyObjectHeap( 0 );//Static member initialization


The variants differ in creation of object. Everything else is the same.

Wednesday, October 31, 2007

What is a protected abstract virtual base pure virtual private destructor?

A protected abstract virtual base pure virtual private destructor.

This is one of the funny question and very less answered one. It may be a very long sentence. But the code needed for making a protected abstract virtual base pure virtual private destructor is quite simple.
The below code is the one which makes a protected abstract virtual base pure virtual private destructor

Program:

class BaseClass // An abstract class
{
public:
virtual void MakeAbstract() = 0;
};

// A class derived as an protected abstract virtual base
class AbstractBase : virtual protected BaseClass
{
private:
void MakeAbstract(){;}
// A pure virtual private destructor
virtual ~AbstractBase() = 0;
friend class Derived;
};

AbstractBase::~AbstractBase()
{

}
class Derived : protected AbstractBase
{

};

int main(int argc, _TCHAR* argv[])
{
// You can definitely make an object of class Derived
Derived obj;
return 0;
}


Explanation:

In the above program AbstractBase::~AbstractBase() can be called as a protected abstract virtual base pure virtual private destructor. Let us see how it can be called so.

1. The class AbstractBase is derived as "virtual protected" from an "abstract base" class. So we can call class AbstractBase as a "protected abstract virtual base".

2. Now let us check the destructor of class AbstractBase. It is made as a "pure virtual private" one. So we can call it as a "pure virtual private destructor".

3. Now combining both, we can call the destructor of AbstractBase as "protected abstract virtual base pure virtual private destructor"


Use:

This question can measure the knowledge in C++. Practically it wont have much of a use in implementation point of view. This question was made to prove that C++ is too complex and weird. But when we see the code for such a big definition it serves in the opposite way. Its quite easy to write long sentences in very few line of codes.

RapidMind - Stream programming

Why RapidMind?

RapidMind helps us to introduce data parallelism in our program. Data parallelism can optimize the program speed to a big extent. RapidMind gives data parallelism by using stream computing.
Stream computing and Stream processors are nothing new to most of us right now. It helps us to execute some kernels(functions) on multiple data. Intel SSE, GPU etc are example of Stream Computing. Rapid mind as of the data supports both Cell BE( An IBM Cell architecture ) and GPU( like nVIDIA, ATI ) and will be supporting X86 architecture in near future.
It is quite easy to convert your serial program to a stream program using RapidMind. RapidMind is purely implemented in C++. Every things are wrapped into a namespace RapidMind. So the development also becomes easy.

Now I will quote an example for converting a normal serial program to a RapidMind program.

The below programs does operations on 4 floating point values. The operations are done on 16 bytes. Let us see how the implementation differs for a RapidMind program and a normal program.

Normal program

float SquareofIndividualSquare( float a, float b )
{
return(( a*a + b*b ) * ( a*a + b*b ));
}


int _tmain(int argc, _TCHAR* argv[])
{
float* fFirstElement = new float[2048*2048*4];
float* fSecondElement = new float[2048*2048*4];
int nIndex = 0;;
for( int i = 0; i < 2048; i++ )
{
for ( int j = 0; j < 2048; j++ )
{
for ( int floatnum = 0; floatnum < 4; floatnum++, nIndex++ )
{
fFirstElement[nIndex] = float(floatnum);
fSecondElement[nIndex] = float(floatnum);
}
}
}
nIndex = 0;
for( int i = 0; i < 2048; i++ )
{
for ( int j = 0; j < 2048; j++ )
{
for ( int floatnum = 0; floatnum < 4; floatnum++,nIndex++ )
{
fFirstElement[nIndex] = SquareofIndividualSquare( fFirstElement[nIndex],fSecondElement[nIndex] );
}
}
}
}

RapidMind Program

#include
using namespace rapidmind;

int main()
{
// Do the initialization of rapid mind platform
rapidmind::init();
// Since GPU is used set the backend as GLSL( OpenGL shader )
use_backend("glsl");

// Array is template class.
// Value4f means 4 floats per each element
Array<2,Value4f> a(2048,2048);
Array<2,Value4f> b(2048,2048);

// This is how we get access to the actual array location.
// Now we can use these pointer to -
// manipulate internal data using CPU.
float* fFirstElement = a.write_data();
float* fSecondElement = b.write_data();

// Fill the input arrays
int nIndex = 0;;
for( int i = 0; i < 2048; i++ )
{
for ( int j = 0; j < 2048; j++ )
{
for ( int floatnum = 0; floatnum < 4; floatnum++, nIndex++ )
{
fFirstElement[nIndex] = float(floatnum);
fSecondElement[nIndex] = float(floatnum);
}
}
}

// This array can get the output data. A normal array.
Array<2,Value4f> output;

// The stream program that will be executed on the data
// This will be executed on GPU.
Program prg = RM_BEGIN {
In a; // First input
In b; // Second input
Out c; // Output

c = (a*a + b*b)*(a*a+b*b); // Data manipulation
} RM_END;

// Execute the stream program
// The output will be available in output array.
output = prg(a, b);
}



Description:


We can see that in the rapid mind program the internal for loop can be replaced. The multiplication of 4 floats are done with one line. This is the advantage of using stream computing. You can process more than one data at a time.


Important:

When we check the performance of the above program we will find the CPU is giving high performance. But this wont be the case if we do a lots of processing. The CPU gives better performance for the programs with very less processing on data because of CPU caching and memory speed. But if we have a chunk of data and we need to do a lot of process on that data the RapidMind will be the best option. Also we are expecting a x86 version of RapidMind. If an x86 version is available it may take out this problem also.

Saturday, October 27, 2007

Performance analysis C++ vs C#

Performance analysis C++ vs C#

Description

C++ or C#, Which is the best language?
This question have a clear answer if you are thinking in performance point of view. In the performance area C++ zigs where C# zags.

Let us take a small example,

Here I am doing some matrix operations using C++ and C#. Both are executing same algorithm.

C++ Program

int nRetCode = 0;
const int nSize = 500;
int* nMatrix1 = new int[nSize*nSize];
int* nMatrix2 = new int[nSize*nSize];
int* nMultipliedMatrix = new int[nSize*nSize];
for (int i = 0; i < nSize; i++)
{
for (int j = 0; j < nSize; j++)
{
nMatrix1[i*nSize+j] = i;
nMatrix2[i*nSize+j] = j;
}
}
int nElapsed = 0;
int nLoopCount = 5;
for( int nVal = 0; nVal < nLoopCount; nVal++ )
{
int nStart = GetTickCount();
for (int i = 0; i < nSize; i++)
{
for (int j = 0; j < nSize; j++)
{
for (int k = 0; k < nSize; k++)
{
nMultipliedMatrix[i*nSize+j] = nMatrix1[i*nSize+k] + nMatrix2[k*nSize+j];
}
}
}
nElapsed += GetTickCount()- nStart;
}
delete[] nMatrix1;
delete[] nMatrix2;
delete[] nMultipliedMatrix;
std::cout << nElapsed / nLoopCount;
return nRetCode;

C# Program
class Program
{
public const int nSize = 500;
static void Main(string[] args)
{
int[] nMatrix1 = new int[nSize * nSize];
int[] nMatrix2 = new int[nSize * nSize];
int[] nMultipliedMatrix = new int[nSize * nSize];
for (int i = 0; i < nSize; i++)
{
for (int j = 0; j < nSize; j++)
{
nMatrix1[i*nSize + j] = i;
nMatrix2[i*nSize + j] = j;
}
}
int nLoopCount = 5;
int nElapsed = 0;
for (int nVal = 0; nVal < nLoopCount; nVal++)
{
int nStart = Environment.TickCount;
for (int i = 0; i < nSize; i++)
{
for (int j = 0; j < nSize; j++)
{
for (int k = 0; k < nSize; k++)
{
nMultipliedMatrix[i * nSize + j] = nMatrix1[i * nSize + k] + nMatrix2[k * nSize + j];
}
}
}
nElapsed += Environment.TickCount - nStart;
}
Console.WriteLine(nElapsed/nLoopCount);
}
}


Above program does some basic matrix operations. While checking the performance of same algorithm implemented using C++ and C# it can be understood that the C++ is giving an excellent performance.
When this program was ran on an Intel Pentium 4 3.2Ghz machine with 1GB RAM the time taken was as follows.

C++ code(Average of 5 execution) = 785ms.
C# code(Average of 5 execution) = 1465ms.

So clearly we can understand that the C++ outplays C# in the case of performance. Even in executing a basic algorithm without many of the OOP like overloading, runtime polymorphism the program is taking this much of performance loss.
But c# have many other advantages like maintainability, understandability etc. But all these comes at the cost of performance.

SIMD - Intel SSE( Streaming SIMD Extension )

Intel SSE - Streaming SIMD Extension

What is SSE?

SSE is a an instruction set which has 4 series. SSE, SSE2, SSE3, SSE4. These instructions work on 128bit registers called XMM register. So the application can even grow 4 times faster. The instruction set can be downloaded from intel website directly. There are two different concepts in SSE which lets you read and write a block of data in memory.

Prefetching: The prefetching helps you to cache the data from memory before the use of it comes. You can cache the data to different caches as you select.

Non-temporal storing - The non temporal storing helps you to write data to memory bypassing the cache. This can help you to avoid cache polution.

Example:

An example is given below which does memory copy of data. The ordinary memcpy function will copy data as 4byte blocks in the best case. But using SSE instruction we can copy 16bytes of data together. And normal memory copy polutes the cache whereas the SSE instruction can bypass the cache. The following code is just a stub.

Normal memory copy:

mov ecx, count
shr ecx, 1 // copying 2bytes at a time
mov esi, source
mov edi, destination
rep movsd // moves the data from source to destination. the ecx will be used as number of bytes

SSE memory copy:

mov ecx, count
shr ecx, 4 // copying 16bytes at a time
mov esi, source
prefetchnta [esi] // prefetching the data from cache
mov edi, destination
cmp ecx, 0
jz END

NEXT:
movdqa xmm0, [esi] // reading the data from memory expecting memory is 16 byte aligned
movntdq [edi], xmm0 // writing data directly to memory bypassing cache expecting destination memory is 16 byte aligned
cmp ecx, 0
jnz NEXT


I have written a small code block which copies memory using SSE and without SSE. The code with SSE will work very much faster than the one witout SSE.

So if the program can be data parallelized SSE instructions can improve the performance of a program quite heavily.

Friday, July 27, 2007

typedef name as identifier of constructor/destructor

Description: As per the C++98 a constructor declarator can be a typedef name if the declaration is inside class member specification. It must not be allowed if the declaration is done outside the class member specification.

But the both GCC and VC++ differs in the behaviour. VC++ have correct implementation compared to gcc 3.4.2. Consider the following example,

Example:

class Alpha;
typedef Alpha Constructor;

class Alpha
{
public:
Constructor::Constructor(){}
};

int main()
{
Alpha obj;
}
Explanation:

If you compile the above program using VC++ it will be getting compiled. But if you use the gcc 3.4.2 it will not compile.

I think it is a mistake with GCC because they did not take care of this point in the ISOC++98 standard. And it may be an implementation artifact of Visual C++ that gets this code compiled.

Friday, July 20, 2007

Calling a virtual function of a class from its constructor/destructor

Description:

As you know the construction is done from base to derived and destruction just opposite you should avoid calling a virtual funciton from both constructor and destructor. When you call the virtual function from constructor the derived class is not yet constructed and if from destructor the derived class is already destructed. Hence compiler will make an arrangement so that the local class(ie Base class) virtual function itself is called.

If you call a pure virtual function from your constructor/destructor directly or indirectly the program will be ill-formed. It can either show a compile time error or else a runtime error "Pure virtual function called"(normaly runtime error occures in case of indirect call from constructor ie calling a local non virtual member function and from that member function a call to pure virtual function ).

Example 1

class Base
{
public:
Base()
{
CallPureVirtual(); // Calling pure virtual indirectly from constructor // LINE 6
}
private:
virtual void PureVirtual() = 0;
void CallPureVirtual(){ PureVirtual(); };
};

class Derived : public Base
{
public:
Derived()
{
}
void PureVirtual()
{
std::cout << "Derived::PureVirtual()" << std::endl;
}
};

int main()
{
Derived obj;
return 0;
}


Output and explanation
If the above program is compiled using gcc/VC++2005/VC6 it will definitly give a runtime error. Because a pure virtual function is called.

Example 2

class Base
{
public:
Base()
{
std::cout << "From Base Constructor" << std::endl;
VirtualFun();
}
~Base()
{
std::cout << "From Base Destructor" << std::endl;
VirtualFun();
}
void CallVirtual()
{
std::cout << "From Base CallVirtual" << std::endl;
VirtualFun();
}
private:
virtual void VirtualFun()
{
std::cout << "Base::VirtualFun()" << std::endl;
}
};

class Derived : public Base
{
public:
Derived()
{
}
void VirtualFun()
{
std::cout << "Derived::PureVirtual()" << std::endl;
}
};

int main()
{
Derived obj;
obj.CallVirtual();
return 0;
}

Output and explanation
The above program will output,

From Base Constructor
Base::VirtualFun()
From Base CallVirtual
Derived::PureVirtual()
From Base Destructor
Base::VirtualFun()


Here we can find that from the constructor and destructor of base class the local virtual function is getting called and in other cases the virtual function of the appropriate object is called.

So it is not that you cant call a virtual function from a constructor or destructor, but it may not get you what you desired.

Friday, July 6, 2007

Making a class non derivable / Disabling Inheritability of a class

Description: Have you ever thought how to make a class non-derivable in C++? Your can do it by making use of the rule "The most derived class calls the constructor/destructor of virtual base class".

At first to understand it we should analyze how we solved the diamond inheritance problem. For that we used the concept of virtual base class and inheriting the intermediate class virtually. And then the most derived class (The class of that the object is created, the most bottom one in diamond pattern) calls the constructor/destructor of the virtual base class. So this is the feature that we can use for making a class non derivable.

Code snippet: Here the class NonDerivable is a class from which you cant derive any other class.

class NonDerivable; /* This is just a forward declaration of class that need to be made non derivable*/

class DisableDerive /* A base class */
{
private:
DisableDerive(){}; /* Made the constructor as private*/
friend class NonDerivable; /* Made NonDerivable as friend so that it can access private constructor*/
};
class NonDerivable: virtual public DisableDerive /* Intermediate class - Virtual base*/
{
private:
int m_nVal;
public:
NonDerivable(){ m_nVal = -1;};
void SetValue( int nVal ){ m_nVal = 10; }
};

class TryDerive: public NonDerivable /* Deriving from non derivable class*/
{
};

int main(int argc, _TCHAR* argv[])
{
TryDerive Alpha; /* This will cause compile error because TryDerive cant call constructor of DisableDerive.*/
}


Here the TryDerive tries to derive from NonDerivable. So TryDerive is the most derived class. Hence the constructor of virtual base class DisableDerive will have to be called by TryDerive. But TryDerive cannot access the constructor of DisableDerive since it is private. Hence it will create a compile time error.

Error: error C2248: 'DisableDerive::DisableDerive' : cannot access private member declared in class 'DisableDerive'

But you can of course create the object of class NonDerivable.

And if you just change the line, class NonDerivable: virtual public DisableDerive to class NonDerivable: public DisableDerive, you can derive from the class NonDerivable because the constructor will be called by NonDerivable which is a friend of DisableDerive (It happens because the next derived class calls the constructor).

Microsoft VC++ 2005 Compiler Bug #2

During a friend class specification the compiler is accepting a typedef name. ISOIEC14882-1998 section 9.1 point 5 states that a "typedef-name that names a class is a class-name, but shall not be used in an elaborated-type-specifier". Sections regarding typedef and friend also defines the same. So this usage must raise a compile time error. Instead the vc++2005 compiler accepts the program and successfully compiles it.

Program 1:

#include "stdafx.h"
#include
using namespace std;

class Alpha;
typedef Alpha Constructor ;

class Beta
{
public:
Beta(){;}
private:
int nVal;
friend class Constructor;
};

class Alpha
{
private:
public:
Alpha() {;}
};

int main()
{
Alpha obj1;
Beta obj2;
return 0;
}

Again if the private data of class Beta is accessed by class data, a compile time error is generated regarding access of private data. Here also error must be usage of typedef name in a place where elaborated-type-specifier must be used. Program 2 given below demonstrates this issue.

Program 2
using namespace std;

class Alpha;
typedef Alpha Constructor ;

class Beta
{
private:
int nVal;
friend class Constructor;
};

class Alpha
{
private:
Beta obj;
public:
Alpha::Alpha()
{
obj.nVal = 10;
cout<<"Constructor\n";
}
};

int main()
{
Alpha obj1;
return 0;
}

Microsoft Comment

Thanks for your feedback. We have tried to reproduce the issue with the latest Visual Studio build (Orcas Beta 1) (http://www.microsoft.com/downloads/details.aspx?FamilyId=36B6609E-6F3D-40F4-8C7D-AD111679D8DC&displaylang=en) and we cannot reproduce it. It is likely a known issue fixed in the latest build. If the issue is critical to you with the release of Visual Studio you are working with, you may contact our Product Support Services (http://support.microsoft.com). Our dedicated Support Engineer will work with you to investigate to issue further.

Microsoft VC++ 2005 Compiler Bug #1

It seems like there is implementation difference from ISOIEC14882-1998 rules regarding multiple inheritance and private access specifier.
The private virtual base destructor is accessible from most derived class. No compile time error and no runtime error for program.

class DisableDerive
{
public:
DisableDerive(){ ; }

private:
~DisableDerive(){ ; }
friend class NonDerivable;
};

class NonDerivable: virtual protected DisableDerive
{
private:
int m_nVal;
public:
NonDerivable()
{
std::cout<<"NonDerivable::NonDerivable()\n";
m_nVal = -1;
}
~NonDerivable()
{
std::cout<<"NonDerivable::~NonDerivable()\n";
}
void SetValue( int nVal ){ m_nVal = 10; }
};

class TryDerive: public NonDerivable
{
public:
TryDerive(){ ; }
~TryDerive(){ ; }
};

int main()
{
TryDerive a;
return 0;
}


This code is getting compiled and executed. But there must be an error because the DisableDerive::~DisableDerive must not be accessible from TryDerive. The constructor is not accessible and the problem is only with destructor.

This code is giving expected compile time error in Visual Studio 6.0.

Microsoft Comments on this bug

Thanks for your feedback. We have reproduced this bug on Win2003 SP2 and OrcasBeta1VSTS, and we are sending this bug to the appropriate group within the Visual Studio Product Team for triage and resolution. Thank you, Visual Studio Product Team.
Posted by Microsoft on 6/26/2007 at 8:24 PM

Virtual destruction, delete() and delete[]

Pre-Information
Q. Why base class destructor must be virtual?

A. A base class destructor must be virtual. The reason is if the derived class object is deleted using a base class pointer, the program will be ill-formed( Base class destructor may only be called ), if base class destructor is not virtual.

Tip:
The above rule will only make the program correct if we are NOT deleting the array of objects( ie delete[] ). If we use it the program may be ill-formed.

Example:
struct Base
{
virtual ~Base(){ cout << "~Base()" << endl; }
void operator delete[](void* pObj, size_t){ cout << "Base::delete[] operator" << endl;
::delete [] pObj; }
void operator delete(void* pObj ){ cout << "Base::delete() operator" << endl;
::delete pObj; }
};

struct Derived : Base
{
~Derived(){ cout << "~Derived()" << endl; }
void operator delete[](void* pObj, size_t){ cout << "Derived::operator delete[]" << endl;
::delete [] pObj; }
void operator delete(void* pObj ) { cout << "Derived::operator delete()" << endl;
::delete pObj; }
};

int main()
{
cout << "Creating one derived class object"< Base* pBase = new Derived;
delete pBase; // Deleting a derived class object with base class pointer
cout << "Creating array of derrived class objects";
Base* BasePointer = new Derived[3];
delete[] BasePointer; // Deleting array of derived class objects with base class pointer
getch();
}

In the above program the delete pBase guarantees the proper deallocation of the memory. But the delete[] BasePointer is not assured to be working. The usage of delete[] BasePointer may make the program ill-formed.

If you run the above program it will output:

Creating one derived class object
~Derived()
~Base()
Derived::operator delete()
Creating array of derrived class objects~Derived()
~Base()
~Derived()
~Base()
~Derived()
~Base()
Base::delete[]

The derived class object is deallocated using base class's deallocation function.
This usage is strictly disallowed by C++98 standards.