Tuesday, September 05, 2006

Why do I perfer Custom Collection over Dataset?

I don't like dataset as a mechanism to transfer data between two layers for following reasons

1. Not a true OOP concept
I seriously follow Rockford Lhotka's writing and agree with him on this.

2. Bad Performance.
I did some benchmarking tests on Custom Collections vs. Dataset as being transport mechanism in Web Services Scenario in Net 1.0. Custom collections performed four times better than Dataset. I am sure in remoting scenario , difference would be more dramatic. I am going to perform very soon some benchmarks tests on .Net 2.0

More Details

3. More Code
One ends up writing more code in packaging and unpackaging data in dataset.Remoting and WebServices infrastructure takes care lot of Serialization and Deserialization issue. Before Generics, by getting rid of dataset, I was able to get rid of 1000s line of code and get more performance.
Recently I did some benchmark tests on generics from my office computer (Which has less memory) and got more encouraging results. I will post them very soon on my blog

4. Not SOA friendly
If datasets are exposed through WebSerivecs, only .net Clients can use these Web-Services.

Chirdeep Shetty has very a nice post about problems that he faced to convert C# WebServices to Java WebServices because of dataset and workaround.

Having said that I may use Dataset/DataTable in following situations
1. When I have a situation where columns of a custom collection are variable. DataTable/Dataset IMO offers more clean solution.
2. Disconnected rich client architecture


survic said...

Nice blog. However, here are the counter development:

(a) http://www.lhotka.net/weblog/CommentView,guid,ad5be814-6063-43e0-b703-932771444b98.aspx

(b)link you gave; at the end of it, it seems that performance is not that bad?

(c) CSLA etc. has a lot of code; and requires a lot of custom code (hence code generation). Also, when we (count me as one) use datasets, we tend to have the attitude "just make it work now". If we try to design a few rules and techniques, it would be that bad? -- perhaps it is time to have a serious try -- will it be worth it? -- it is just a feeling. I will examine the databinding book I read (it has pretty strict 3-layer view; so, we have a good base), and try it.

Vikas said...

Measure, Measure and Measure
Did I post this link?
Darn me.
The last reply really shattered me and my beliefs and still haunts me.
That is why this benchmarking test has been on my mind since then.
Now yours switching to other side is too much for me.
Jokes apart, I did some bench marking tests for dataset vs. custom collection for .Net 2.0 Platform

Creating a simple dataset with 50,000 rows is three time slower than generic collection.
Creating a simple dataset with 50,000 rows and sending it across Web Service is six times slower than generic collection.

My computer has 1GB Memory, Pentium 4 (3GHz)

For an industrial strength application, I am not comfortable making these performance tradeoff in favor of dataset. For a prototype or applications for less than 4 users, it may not matter.

We have already code-Generation tool (takes 5 working days) and other framework ready. Need to make some modifications for accommodating Generics and generate less code.

I feel relaxed.

survic said...

Nice measurements! I will take blindly also :-)

I am just following Rocky switching sides ;-) or, to achieve the Q continuum.


1. Will you use “dataportal”?

2. After generics, how many collections classes left?

Kevin I said...

I've done the metrics for dataset vs custom as well, and it isn't as bad as it is made out to be.

If you use the Beginedit/endEdit, it is a small bit smaller on storing values, but the serialization issues can be remedied by making a simple utility function to convert the dataset to binary (using the binary serializer). We did this for our typed datasets, and we can rehydrate the entire thing on the other side, in about 1/10th the size of the xml load.

And updates are MUCH faster with datasets since the dataadapter supports sending multiple update statements in one call. You'd have to write munge code for a custom object to create the strings and do the work otherwise. So if I have a datatable with about 8 different tables, depending on my batch size, it could be as little as 8 sql calls, wheras it would be 8* the number of rows for custom. The dataset smoked custom in this scenario.

As far as interop, I agree, if you need cross-platform, this isn't the way to go - but if you're on MS platform, it works very nicely. One of the neat things is when you make updates, you don't have to send the whole object back, just use the .Changes and you get a much smaller payload, can even be smaller than the static object you created that held the data to begin with.

So, if you're comparing the speed of loading 50,000 rows - but in actuality you're maybe working with < 500, then I'd reconsider because there are a lot of great things in the dataset (relations, column definition metadata, bindable to a number of controls if winform or asp.net developer).

Comparing a dataset to a generic collection is not even a apples to apples comparison. Heck, if I need a list of ints, List< int > is what I use. If I need something to hold data that I need to bind, or need some sort of edit ability (begin/end/cancel), then the dataset is the tool.

BK said...

Vikas, I have not read this thread from start to end... I did some similar tests last year.. what I found is...

1. if you use compression sink and custom serialization with dataset -- they are as fast as custom collection while transferring data with volume as big as million rows.

2. with .net 2 - dataset serialization is quite improved (and I believe it uses binary serialization - not xml serialization) - there is no need to implement custom serialization logic (which unfortunately reqd. in .net 1.1)

hope this helps.