Confusions about abstraction

Confusions about abstraction

What the concept abstract really means ?

The term abstract is widely used, but the meaning in many cases is vague or fuzzy. To calibrate the term let’s try to define what we exactly mean with the concept (the word)

The next is a quote from English Wikipedia.  The definition of the word abstraction:

Abstraction is the process or result of generalization by reducing the information content of a concept or an observable phenomenon, typically in order to retain only information which is relevant for a particular purpose. For example, abstracting a leather soccer ball to a ball retains only the information on general ball attributes and behavior.

Abstraction uses a strategy of simplification, wherein formerly concrete details are left ambiguous, vague, or undefined; thus effective communication about things in the abstract requires an intuitive or common experience between the communicator and the communication recipient.

Abstraction is reducing the information content

So here the point is simplification by reducing the amount of details.  The purpose if this is to emphasize the important aspects from the chosen point of view.

To say that abstraction is simplification is true, but this expression could be interpreted too strongly. The simplification is in most cases achieved be reduction of details. So there is still room to move within the simplification. The amount and nature of simplification depends on the situation and change case by case.

From pure theoretical (mathematical)point of view we can say that abstraction is a mapping  f ( L) -> R, where L is the source set (in our case a subset of reality)and R is the result set (the model of reality). The mapping  f will abstract (simplified ) the complexity of L if the result set is more simple than the source set. In most general form this means reduction of the number of detail. This implies that the mapping f is homomorphism but is not isomorphism. Otherwise the reduction would be 0-reduction. Homomorphism means that many source points should map to one result point. This way a reverse mapping is not possible. This implies that the mapping genuinely destroys information. This lost information cannot be returned in any way just from the model.

A terrain and a map of this terrain is a good example of this kind of modeling.

Simplification is a double-edged sword. When the point of view and usage is strait forward and simple normally there are no problems. When we consider my geographical map example if we need direction with car in southern Finland or street map for waking in Helsinki city center, then the scaling of  the map is no big issue. The situation changes a lot when the “map” (model) is used in several different points of view and interests.

Common confusions of “abstract”- concept

Non-concrete is abstract
Abstract-concept is often used fuzzily as synonym for non-concrete and/or difficult. Sometimes people say that mathematic or geometry are “abstract” as such. This does not however conform to the previous definition. In other words there happens neither simplification nor reduction of detail. Both of these mainstream fields of mathematic are axiomatic system with rules of manipulation.   Completely other story is then the fact that mathematic and geometry can be used as instruments in abstraction but in such a situation the mathematic is only vehicle or tool to create the mapping between sets. In the same way we can say that chess game is abstract. This is false again. The game is most concrete thing in the world with it’s board and chess pieces and game rules. Following the same deduction also the programming languages are not abstract but well defined games in their own world.

Programming is abstract

The world of computer programming is defined by the physical structure of von Neuman machine. The next question is then do programming languages form layers of abstraction above each other? Many people say that Cobol is more abstract than assembler. My opinion is NO in my strict use of the concept abstract. My argumentation is following. As our current programming languages are all deterministic the code cannot truly simplify anything that effects the decisions made in the path of execution. This implies that all deterministic programs are isomorphic with each other. This means that when we have a function f from programming language a to b which gives the mapping then there always exists e reverse function f-1 which gives the reverse mapping from program in b back to original program in a. The real thing that some call abstraction is only compression! Here is a small simple example of that compression.

Example:  for-loop java 4 ja 5

void cancelAll(Collection<TimerTask> c) {
    for (Iterator<TimerTask> i = c.iterator(); i.hasNext(); );


void cancelAll(Collection<TimerTask> c) {
    for (TimerTask t : c)

Simplification always boils down reducing the number elements – the less important ones.  When the program is deterministic all those elements (attributes and their values) that control the flow of execution control must be present regardless the chosen programming language.  So in every equivalent program exactly the same if– statement must be present in one form or an other

UML is  abstract

UML is one-to-one mapping between a well defined set of concept and their corresponding graphical signs. This mapping is isomorphic between the graph and the garpth “verbal structure” which can be any programming language. This way UML is a transformation algoritm between a diagram  and a description (which can use for instance Java language).

About real simplification

Let’s return real simplification. When we model the reality we face all the time the question of how much do we simplify. When we are doing the mapping between the realty and the model we have to choose the scale. The following diagram illustrates the two sides of the decision. The y-axis is the amount of abstraction (or simplification) and x-axis describes the amount of semantics within a concept. When you pick up a point in y-axis and decide the level of abstraction the at the same time you get the amount of semantic value, which is the width of the triangle at the point. So the higher we are the smaller amount of semantic the concept gives.

Abstraction Triangel

There is a shaded  area in the center of the y-axis.  This shows optimal (read the best possible) level of abstraction and the read thread indicates that different individual concepts (read classes) can be at different levels of abstraction. This actually means that the benefit that we get from abstraction increases to maxim somewhere in the middle from 0-abstrction to total abstraction, which is single point with no content.  See the next diagram:

A common myth is that the higher the level of abstraction is the better. As this discussion shown this is not the case in contrary. As we can see from the parable at first there is a clear increase of clarity will follow but then at some point the raising of the level will reach a point where the simplification starts to corrupt the most essential parts of the information and finally the mapping will collapse to zero.  So both ends of this graph are area of danger.

First example of big crumpling at the left end of the graph was IBM ambitious attempt with objects. The project was called San Francisco (3200 classes, 24600 methods)  at the end of previous millennium.  The attempt was more or less to produce a model that would cover all possible businesses. As you can see from the figures above the level of abstraction was far too low. The model was finally constructed with huge effort. The trouble saw that it saw totally useless with that amount of information.  I am still quite frequently running into attitude where really big (read detailed) models are bolt and to be proud of but sadly the trough is almost the opposite.

The second extreme is of course at the other end of the function. These are models with very general concepts and only a few needed. Usually these models are technically correct but semantically completely empty. So they look nice but don’t contain any real value to develop applications.

This is the point where I can return to Grady Booch, when he asked his audience: “When is a domain model ready?” His answer to this question was that it not ready when all the possible classes have been added to the model but it is ready when you cannot remove a single class from the model without totally collapsing it!

My experience is that such a model typically consists of 30 – 60 classes. So even here the famous rule of Albert Einstein: “Everything should be made as simple as possible, but not simpler.”  is completely valid!

By the way the modeler can decide the number of classes in the model even without knowing anything about the target reality. This is of course done by either lifting or lowering the level of abstraction of several classes of the model.

Abstraction within programming

The level of abstraction of the classes is not directly reflected in the absolute number of classes but rather in the relative number of classes and methods together. This means that the higher the level of abstraction in the model is the more complicated are the implemented methods and vise verse.

This way the logical 3-tier architecture can lift the abstraction level for the GUI programmer by encapsulating the lower level details inside the object boundary into the method implementation.


One Response

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: