Sunday, August 06, 2006

Domain Driven Design based on Entity Model vs E/R Model

Continued from

Survic said
“I agree with you about the size of projects; but deep in my heart, I also believe that all larger business projects can be split into smaller projects, so, I do not believe “TDD with mocks and DDD without data” is good for anything -- I will keep an open mind though”

Hi Survic,
Thanks for pointing to Scott W. Ambler.’s book ‘The Object Primer’. It helped me to write this post. That is what I wanted to articulate in first place
First thing nobody is underestimating the importance of Data in Domain Driven Design. But there are following problems or realities in Enterprise Data Design (Scott calls Legacy Database design but DBAs continue to employ for consistency purpose)

1. A single data field is use for several purposes.
2. The purpose of a data field is determined by the value of one or more other columns.
3. Inconsistent values are stored in a single data field.
4. There is inconsistent/incorrect data formatting within a column
5. Some data values are missing with in a data field
6. One or more data fields that require do not exist.
7. Additional data fields that your application will need to support if it uses the legacy data exist
8. Multiple source exist for the same data and it is not clear which one to use
9. Important entities, attributes and relationship are hidden and floating in text fields
10. Data values can stray from their field descriptions and business rules.
11. Various key strategies are used to identify the same type of entity
12. You require a relationship between data records that is not supported by legacy data.
13. One attribute is stored in several fields.
14. Special characters within a data field are inconsistently used.
15. Different data types are used for similar columns.
16. The legacy data do not contain sufficient detail.
17. The legacy data contain too much data.
18. The legacy data are read-only, yet you require update access.
19. The timeliness of data varies from what you require
20. The default value used by a legacy application does not reflect the default value required by your system.
21. Different representations of the data exist.
22. The naming conventions used are difficult to understand

As per my experience with DBAs in past,Some decisions may be right in their own way. They are more concerned about efficient storage of data; need to be consistent with existing databases (what Scott calls legacy) and future extensibility at data storage level.

Some of above factors limit the database design diagram to be used as effective communication tool with all stakeholders. We need a abstraction above database which hide these complexities and communicate more clearly Problem Domain/System Blueprints to users.

This is where come entity classes’ diagrams (Since we both agree on importance of process classes diagrams, activity diagrams, Sequence diagrams etc.) which can hide all above mentioned storage details from system users. It will help the programmer to nail down the requirement and prepare the blueprints with out waiting for database design to be over. Once requirements are completely nailed, data storage/retrieval is more implementation details.

This debate is similar to XML Hell debate. Nobody is arguing against the importance of XML but we need an abstraction/Graphical tools to hide its rawness as discussed in the following mail

The Object Primer – By Scott W. Ambler


Vikas said...

Survic wrote
“So, if it is a new database, we design it ourselves; if it is a legacy one, then, we need the data within it. Either way or both ways, we need to deal with database early on.”

Let us keep the context right.
So, if it is a new database, we help DBA to design ; if it is a legacy one, then we need the data within it. Either way or both ways, we need to deal with DBAs early on. DBAs will have their own deadlines and their own priorities.

Now I will switch to your context. When you are designing the database , it is very important to make sure that you have all the data that user is asking for reporting. One can use Screen Mockups, relevant E/R diagram(60% to 90%), Process logic in form of (Class diagrams, Sequence Diagram, activity flow charts) to communicate with users. That is perfect. My contention was that while it is okay, do the database design simultaneously but hide it from user by exposing them to entity domain model. A model which uses vocabulary with which users are more familiar but hides the complex storage and retrieval details from users will be more useful. But if you think that it will waste of time. I again agree with you 100 percent.

Survic wrote
I am waiting for your ideas on a domain model without being “anemic".

I did write how I see Domain Model as one level abstraction over database hiding the data storage and retrieval.
Domain model will be anemic from 60% to 90%. Let me tell you about my experience of Domain Driven Design. I was working on a project six years back with DBA in charge of Database Design and we were pulling lot of data from legacy system. So, database diagram was a secret till the end of design phase. We, developers did give lot of inputs about data requirement and how we think that certain aspects of data storage should be like. Besides creating the Screen Shots, data flow diagrams, I also prepared a Domain Model by finding all the actors and their characteristics/behavior. Finally I optimized my domain model by decomposing the entities that have one-to-one relationship in single entities.
When I saw the database E/R diagrams at the end of design phase, it exactly matched my domain model. I felt like a fool but it was not in my hand. My regret is that I didn’t scope the process classes in design phase. I ended up writing a thin client with all process/workflow/co-ordination logic in the UI Layer. I corrected this during refactoring in next phase of development. We were following waterfall methodology.
I think that it is criminal not to use E/R diagram when it Is handy, to derive the entity classes. Somewhere you have to stop being self-righteous and deliver solutions to users when pay you your monthly check.

Survic wrote:
I have seen a lot of derogative comments about it, but have seen nothing that can solve the problem

You should sugar-coat it to Object Purists by saying “Let us use Database diagram as a starting point for our Domain Driven Design.” And you can stretch the starting point as much as you like

survic said...

I agree with you now, 100% :-)

The database storage details -- you are right: I have to hide them if they are too messy. If I insist on the “data driven” stuff, then, I have to cheat, and persuade myself with the differences among conceptual/logical/physical data models.

However, as you may guess, I do not like those demarcations; because the advantage of database is that the turn-around time between schema design and real data (the “manual Fit”) is within minutes.

More about “manual Fit”: I remember that originally (in the age of DB2 and Oracle) relational databases were build under the assumption that users are supposed to use sql/plus directly. That is the spirit of “manual Fit”!

Admittedly, it turned out that it was an invalid assumption, however, business analysts, or, developers who wear analyst hats should do such “manual Fit”.

Anyway, I found “manual Fit” is a good concept. I am going to use it everywhere. On the other hand, it adds some non-pure-OO elements to Fit though ;-)

Vikas said...

Survic wrote

An "anonymous" sent me to a link, where I found this:

Relax – I know you will not agree with many things it says. However, I promise it is a very good reading.