Why the term: ”database” in object context is also evil

1 .  Impedance mismatch
In the early days of objects there was much talk about “impedance mismatch”. The phrase comes from physics. The intention here was to tell that objects and relation databases have a crucial difference in the point of views used and they can even at concept level been naturally integrated.
2 .The fundamental difference in the core of concepts
The idea of “data” doesn’t exist in the world of objects at all! What does this mean? Well “data” is organic an inseparable part of an object. Compare this with car. The body of a car is so essential part of a car that there is actually no sense to talk about “car without a body” – this just doesn’t make any sense at all. This partial analysis led finally to concepts like “data”warehouse and “data”mining . These revel the absurdness of such separation.
So the “data” is inseparable part of object. This means that a concept of object-database is actually a total misinterpretation. The right term would be closer to object community or society or tribe or similar. Why even object base (without the data) be misleading. Now any collection of unrelated arbitrary object is worthless waist -next to pollution. Objects make sense only in association with other objects! These are not drawbacks of our viewpoint but in the contrary these are the restriction that we want to impose on our model to make our lives easier in the future!

3. Association is a first class citizen

To give for this a bit more organized or formal presentation let’s start by creating the model (metamodel) of our OO-analysis domain. The object model of a class model is the following:

metaMalli

When our class diagram look like following:

oEsim1.1

and a corresponding object diagram can look for instance like this:

oEsim1.2

Now when I would like to get rid of my Renault and I will sell it to Bill then the association between me and that car is removed. The point here is that both ends of the association leaves exactly at the same time. The result looks like this:

What we really mean ie. try to say exactly is that there exists an object typed association: the ownership which is connected to two distinct different type of entity objects ( person & car). Now from this we can easily see that association is not two mutually independent collections of object as they in most cases are implemented but is one thing with two aspects. When you remove the association object both ends are removed simultaneously and it is logically impossible just to remove one end because such association has now meaning.

4.  Object societies shouldn’t have query-languages
Query languages are a side effect of this data aspect. Technically SQL query feature gives one an illusion of combining almost anything with everything. In reality this is of course not true. In contrary the structure of realty restrict heavily a meaningful use of such feature. Another sad thing is that those current relational database implementations don’t check structural validity of a query. This means that one can join tables though columns and the semantic (ie. meaning) validity is not checked at all.
The right way to do this would be to restrict the navigational possibility only to structurally established associations. In object network (not in database) the navigation though the network can only be done trough established association and this traversing should happen using the corresponding objects get-methods. In every object network an object is heavily dependent of its direct associative neighbors. The dependencies beyond this get rapidly weaker and weaker. In this way for each class there have fuzzy sets of dependencies to other classes. One fundamental principle of a domain class model is that all classes are connected. This means that for every class of each other class is reachable and thus has an associative weight is > 0. In other worlds the network in connected and when one picks up an object from any class one can (at least in best circumstances) reach and an object in every other class of the graph.
The point here is that strictly speaking every class knows only its directly associated classes. To navigate longer than one leg route through the network the navigator is responsible of knowing the route through the network. There is this way a strikingly exact analogy to a street network in a city. The streets don’t know all the routes through the city but the corners know what next corners can be reached from them through their “association” namely street parts connecting two corners.
The only thing to add here to complete the required “query language” is to allow a multi attribute filters for the enable easy filtering of collections. When class A has a association to collection of Bs then A should have getBs() which returns the whole collection. To be effective A should have a selectBs() method with optionally all free attribute types of B. For example when our Person has one to many association to cars so Person class should have as a standard feature a method selectCars() with optionally color and production year (and all other free attributes that car happens to have). So I could get a subset of the whole car collection with the following method call: selectCars(“blue”, 2003).

Advertisements

7 Responses

  1. “The right way to do this would be to restrict the navigational possibility only to structurally established associations.”

    I think that you’re missing two things.

    1. These tools do exist, whether tightly coupled with the RDBMS engine or an add-on, is irrelavent. If this implementation is important to you, it’s been done. You’re not describing anything novel.

    2. Maybe in the OO world is there a limited set of relationships but I can come up with an infinite number of valid relationships in the “Data” world.
    Let’s take the simple model of hospital employees. I have a collection of hospitals and their addresses, and employees and their addressed and a relationship between the two. In your comment, I should ONLY be allowed to query for who works where. In the data world, I could ask questions like these:

    Which employees of one hospital actually live closer to a different hospital? Which employees live at the same address as other employees of the same hospital? of a different hospital? Which employees live within walking distance of the hospital they work at? of another hospital? of a coworker?

    These questions are answerable even though I’ve only set up a single relationship between the two object sets. To think that any one person can exhaustively define every relationship between objects is hubris of the worst kind. The greatest achievements of man are when someone recognizes a pattern that is completely novel. Look at vaccines. They were “discovered” based on the identification of patterns in the natural world that had gone unnoticed for millenia. On a more recent and personal note, I’m always amazed at the way people choose to use application that I write. When I wrote the app, I had one way in mind to use it. But when it’s given to hundreds of people to use, they find more ways for it to be more useful to them, then I could dream of.

    I’m sure that in the world of OO, where you can build in completely sterile, dust-free, clean-rooms the world looks tidy. The reality is that the world isn’t.

  2. Hi Mark

    I feel emotional anger in your text. Well for me this is unemotional intellectual exercise only arguments and pure logic.

    I studied relational calculus as part of computer studies 1975 at Turku University. Since that I was a strong promoter of SQL. In late 80’s the relational community started to understand that the most serious drawback we had was the lack of behavior within in db. At that time people researched things call semantic databases but they were extremely complex and clumsy. As I heard of objects in 1990 from Peter Coad I understood immediately that these guys had found the solution and how beautifully simple! I acquired my first (and until this only) OO -persistence named GemStone. I never had to regret my decision afterwards. It proved to be a real gem! From early 90’s I understood that SQL had come to its end. Today it is actually the worst obstacle of advance in computing -but I leave that for and other story.

    Wikipedia Oracle Corporation

    Ellison took inspiration[citation needed] from the 1970 paper written by Edgar F. Codd on relational database systems named “A Relational Model of Data for Large Shared Data Banks”.[3] He had heard about the IBM System R database from an article in the IBM Research Journal provided by Ed Oates (a future co-founder of Oracle Corporation). System R also derived from Codd’s theories, and Ellison wanted to make his Oracle product compatible with System R, but IBM stopped this by keeping the error codes for their DBMS secret. Ellison co-founded Oracle Corporation in 1977 under the name Software Development Laboratories (SDL). In 1979 SDL changed its name to Relational Software, Inc. (RSI). In 1982, RSI renamed itself as Oracle Systems[4] to align itself more closely with its flagship product Oracle Database. At this stage Robert Miner served as the company’s senior programmer.

    So SQL is the technology of early 70’s! Its concepts that made it so good in early 80’s have been completely outdated at 2000. The simple type structure of column type is in the core of the impedance mismatch. Also the flatness of the structure cause enormous impedance mismatch.

    Here are my comment to your points:

    1) The fact that the code exists is very poor argument. Actually most of the code produced so far is total rapist. The most productive thing (usually expensive though) would be to throw it away. The reason for this gets more clarification in the next point but then most important reason is that the code is too primitive today and using it is actually counterproductive ( negatively productive)
    2)
    The fact is that the SQL in too rudiment and simple. IBM developed the first implementation of SQL. After Codd’s paper we had several competing relational manipulation languages for instance Alfa. As so often happens the best did not win the competition – almost the opposite.
    The very primitive nature makes it possible even obvious that someone create technically totally correct SQL -statement which doesn’t make semantically any sense at all with result complete bullshit. I remember once in a (here unnamed) company, which had been for year running a SQL economical status report and then we noticed a serious bug in the statement so that the figures it produced were actually totally invented (and wrong of course). This made us think should we leave the error unchanged or correct it and thus potentially cause unpredicted turmoil in the company. I don’t any more remember how this thing ended. The teaching here is that If you have a bit more complicated SQL -statement it is very difficult to verify its correctness. The reason for this is the serious lacking of semantics.

    Your address example is very good indeed. Actually with finding people living either in the same household or nearby your SQL helps very little when in two different occasion the street names are entered a little bit different way or in a street corner where one has two street names meaning the same thing . If we treat addresses a set of mutually independent string collection we are very far away from solution. In OO realm we take a class Location and we inherit from this another class call StreetLocation. Both locations contain real unique objects from the surface of the earth. Now these object when created they assure the correctness of a new location in other words it exists. Next thing to do is to teach our Location what means equal. This means that within a predefined tolerance we define when to places are the same. No matter how the place is expressed in the first place (country, city, street name and number or for instance GPS – coordinates). This will also take care of all spelling errors and writing styles of any name and will force uniqueness in them all. The second thing that we teach for it is distance( aLocation). This will give a distance between this and a given place.

    So we should not have anything called data but abstract mapping of real objects which are all time consistent with the reality that they map.

    You: “ should ONLY be allowed to query for who works where. In the data world, I could ask questions like these:
    Which employees of one hospital actually live closer to a different hospital? Which employees live at the same address as other employees of the same hospital? “

    No you got me wrong. What I said is that the persistent objects offers you only established navigational service ie. how works in the hospital and the rest is up to you to decide ( see above the problems to define address quality and distance for example).

    Finally at least my experience of several real big operational management systems (like the one in Finnair) makes me confident that almost all significant access needs are met with good and rich domain models. For instance your example from hospital workers and their home addresses is more that theoretical. I would call it completely superficial. These kind of situation doesn’t come from real life their most common source is bad textbooks – which of course is most sad thing especially for students.

  3. “This will also take care of all spelling errors and writing styles of any name and will force uniqueness in them all.”

    What is “this”? What will enforce uniqueness? Are you saying that there’s a method to enforce uniqueness among objects that won’t work on rows as well?
    Please explain how this would work?

    • The cornerstone of the whole OO-paradigm is the object border – in versus out. In is strictly private for itself. Inside object you can find “data” – the physical entities that finally represent the objects concept and state.
      Continue with my example Location. I as the designer of the concept define that this abstraction Location is a representation of a geographical area. Then I inherit the StreetLocation from this and define that this abstraction is a definition for a genuine subset of Location namely those that reside by some street.

      Now the first thing to implement this is that I will define constructor method for these objects. Here I will decide that I will allow only real ( ie. existing) location to be created. This means that when someone wants to create aLocation he or she is required to give the valid location information as a parameter. I star INTERNALLY in the object with street numbers, street names, city name and country name quartet. The first thing that my constructor will check is that all 4 parameters are provided so they are non-void. If this is true then my constructor will check if this street exists. If this is also true then the new location object is created. The checking can be implemented in many ways. If the validity of this information is really important then we would obtain enough street information to implement this.

      The streetLocation object for the outside world is NOT the quartet but the services that or streetLocation object provides and which are implemented in StreetLocation class. I could have for instance getStreetAddress() -method. The next method to implement is distance(location) and finally getStreetRoute(location).

      Now when the time goes by and new opportunities rise we could have notised the Google maps provide a service to find globally all street addresses based on GSP -point. Now we can change our class private implementation of these methods to be completely based on GPS -coordinates. So we from now on won’t save name strings at all – only the coordinates. This change doesn’t affect the user of this class at all. They don’t actually even be aware of this change at all!

      This seems pretty easy and strait forward but somehow it is difficult for inexperienced. When we consider so simple class as Day, it should be quite obvious that the object should be able to show itself in all possible calendar systems. The Day object should know if it is a bank day, a holiday and so forth. It should be enabled at least basic day calculation like: getOtherDay (interger), which would return a day integer days forward if the integer is positive and integer days backwards if it is negative and finally a method dayDifference(aDay), which of course would return number of days in between negative or positive indicating the direction. The fact remains that for example Java designer have screwed up this class at least twice!

      The essential difference between relational and OO is this behavior that include and is based apon the objects state ( you call “data”) and these are inseparable. OO is a genuine extension of relational so when you take OO paradigm you always get all relation and additional. This additional is the structure that ties up and packages the behavior with the “data”. This is a considerable lift of the model towards consistent real world mapping from low semantic fragments that cannot protect themselves. The implementation of this behavior follows the object trough out its life and the people coming across this object don’t and shouldn’t know anything of the internals just the externals.

  4. You make some gratuitous assertions. Gratuitous assertions can be equally gratuitously denied.

    Anything you call a constructor in the OO world can be a CRUD package or Trigger in the relational world.

    Where the OO world falls down is EXACTLY in the situation you just described. How would the constructor check for BILLIONS of other existing locations quickly? Call the equivalence function of every object sequentially? Yes. That’s the only choice you have, there’s no INDEX in the OO world. While you’re doing that, can you possibly allow another object get added? No. It would face the possibility of allowing a duplicate with the one you’re working on at the moment.

    The funny thing is, I don’t disagree with your statements on OO. It’s not a complex subject. But you see it as being entirely evolutionary – replacing the “previous” world; whereas, I see it as nothing more than best practices from the world of coding. If there were an AlQaeda of OO, you’d be the Usama bin Laden. Ridgid adherence to a cult of monolithic thinking. OO is a tool. Works well in some places, not so well in other. The theory of OO is pretty interesting. Business and all people btw, survive on practicallity. They get a little theory each Saturday or Sunday.

    You remind me of the SOA evangalists. Everything SOA is great, everything not SOA sucks. I’ve seen well engineered applications long before SOA came around. I’ve seen well built code, long before OO became a star. I’ve seen people put together unbelievable systems with Foxpro 2.5. I’ve seen people build pure garbage in C# and Java.

    • Hi Mark
      Some comment to obvious mistakes or misunderstandings

      “You make some gratuitous assertions. Gratuitous assertions can be equally gratuitously denied.”

      Do not understand at all ???

      Anything you call a constructor in the OO world can be a CRUD package or Trigger in the relational world.
      Where the OO world falls down is EXACTLY in the situation you just described. How would the constructor check for BILLIONS of other existing locations quickly?

      Call the equivalence function of every object sequentially? Yes. That’s the only choice you have, there’s no INDEX in the OO world

      NO of course OO doesn’t restrict predefined or dynamic b-trees (or other search algorithms) to be implemented and use as needed for quick search. The location information is of course also naturally subdivided in smaller groups by country, district, city, city block etc. The natural grouping cuts the search space to a small fraction of relevant search base. concurrency is easier to manage with objects than otherwise.

      Here is a short quote from Versant’s product:

      Object Database:
      When applications have complex in-memory object models with predominantly navigational access, object databases provide higher performance than mapping to relational databases.
      ………….
      Objects with moderate complexity are typically 3x faster in an object database, objects with high levels of complexity, such as many-to-many associations are typically 30x faster when using an object database. For collections of collections and recursive relationships, a 50x speed advantage is possible.

      You:
      “ If there were an AlQaeda of OO, you’d be the Usama bin Laden. Ridgid adherence to a cult of monolithic thinking. “

      Again emotional stuff. For me this is pure logic & facts not emotions & beliefs. Actually here the key issue has nothing to do with fanatism but paradigm. Paradigm is the outmost framework to understand world and one can (in a given dimension) choose only ONE paradigm at a time. Thus the parading so to speak closes other paradigm away and thus one and only one can be chosen at a time.
      The analogy between OO and procedural could be steam engine and internal-combustion engine. The era of steam engines was fine and noble. The men who invented and developed it were clever and productive. They actually caused the industrial revolution of the developed world. But today the stem engine is has fully contributed in the chain of development. It has done its part and done it well, but today those days are gone and the steam engine is outdated and history. It is replaced with better internal-combustion engine. Exact analogy is valid between SQL and OO.

      You:

      “I’ve seen people build pure garbage in C# and Java.”

      Me too! I completely agree with you at this point. One of the most difficult things is huge lack of professional talents and especially skills.

      This concludes this discussion (at least from my part). I think that important viewpoints got dealt with. More detailed discussion is not possible within this kind of technical environment.

      Thank you for your opinions.

  5. “The analogy between OO and procedural could be steam engine and internal-combustion engine. The era of steam engines was fine and noble. The men who invented and developed it were clever and productive. They actually caused the industrial revolution of the developed world. But today the stem engine is has fully contributed in the chain of development.”

    This is a GREAT analogy. Have you seen the new six cycle engines? If you take a modern 4 stroke and add a water injection on to the hot block and use the steam to create a 2nd power stroke you can create engines that get even GREATER efficiency than the internal combustion alone.
    Perfect example, thanks for teeing that one up. You can read more here.

    http://www.autoweek.com/article/20060227/free/302270007

    Combining the two technologies is better than either one alone. Steam took us so far, Interal Combustion went farther, now both together could take us even farther by half.

    BTW, OO doesn’t allow for data to exist outside the object. That’s been your point this entire debate. You’ve insisted on strict OO and now you’re confusing that with implementation. If you’re willing to do that there’s zero difference between SQL and OO. A collection of objects with a hash table defining a rapid search is no different than a table of rows and an index. I can do both, either or neither and have the EXACT same outcome… it’s a distinction without a difference.

    You see a statement as emotional because that suits you. The analogy wasn’t emotional, it was as close as I could come to anthropomorphizing your opinions — in no way were they an analogy for you as a person. In radical Islam, strict adherence isn’t an option and the peneties are severe. It seems the same in your technical world. You see one and only one answer to every problem. I’m far more plural. I see every technology as a tool in a toolbox, that can be used to solve every problem in combination. I would never say that a SQL compliant database is the only answer to every problem. I’ve been recently reading about column oriented database… completely sublime – amazing ideas about solving specific problems. But those who design and develop and sell these readily admit they have limitation.

    Also I used that analogy because you keep saying this is about “facts” when you’ve yet to mention a single “fact”. Saying that SQL was invented in the 70’s may be a fact, but it doesn’t support your argument. You’re entire piece is a long opinion and that’s fine… but claiming that it’s all factually based is disingenuous and doesn’t help anyone.

    I’m glad you’re not going to respond, thanks for the debate.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: