Archive

Posts Tagged ‘SQL Server’

Using INT or GUID for Primary Keys? Or COMB GUID?

January 21, 2010 3 comments

Today I read an interesting article by Jimmy Nilsson (the author of .Net Enterprise Design) about using auto-generated primary keys in SQL Server. Especially for Surrogate Keys.

Of course, most of our decisions would be to use INT (or BIGINT) as data type and set IDENTITY to true. There is another solution: using GUID. [NEWID in SQL Server].

When a comparison is done between using INT and GUID as primary key, we can list out few important differences:

Pros of using GUID

  1. GUID will be unique through out the database; while INT is not
  2. @@IDENTITY can be a problem while doing INSERT through TRIGGERS. And using @@IDENTITY to get the identity for the recently added row in the table brings the created identity for the current connection.[This could be solved using SCOPE_IDENTITY (see here for an excellent explanation on this!)]
  3. GUID can be used to create primary keys before inserting the value to the database
  4. Integers are not big enough for most of the scenarios; even though you could start from smallest negative value. Anyway BIGINT can be a solution here.
  5. Using INT could be a real night mare when doing manual merge of tables.

Cons of using GUID

  1. 1. In theory GUID presents a risk of getting duplicate GUIDs; but in practice it is not. In the recent windows version (from windows 2000) the algorithm to generate GUIDs doesn’t use the MAC address of the network cars. Instead it is just a random value.
  2. GUID is 4 times larger in size.
  3. There are portability problems, not all the database have GUID as a data type.
  4. Huge overhead; 4 times larger than INT.
    Evolution of COMB GUID

So apparently overhead is the main problem here. When Jimmy Nilsson did real  inserts using GUID as primary key he found out that time taken for inserts with GUID is 30 times greater than inserts with INT. That happened for 2 reasons

1. The generated GUIDs lacked the order; they were random. As you see in the below table only the 4 is common.

C87FC84A-EE47-47EE-842C-29E969AC5131
2A734AE4-E0EF-4D77-9F84-51A8365AC5A0
70E2E8DE-500E-4630-B3CB-166131D35C21
15ED815C-921C-4011-8667-7158982951EA
56B3BF2D-BDB0-4AFE-A26B-C8F59C4B5103

2. SQL insert statement makes the indexes to be reordered and this takes a lot of time. The new ordering for indexes depended on the last bytes

Therefore Jimmy Nilsson arrived at a new algorithm and his new GUID is called as COMB GUIDs.

Jeffery Palermo in his book ASP.NET MVC in Action uses this COMB GUID for his sample code to explain NHibernate. I am using this COMB GUID for my pet project.

Other Resources:

  1. GUIDs as PRIMARY KEYs and/or the clustering key
  2. NHibernate POID Generators revealed : COMB GUID does not cause database fragmentation
  3. INT vs Unique-Identifier for ID field in database
  4. GUIDs are one of the leading causes of index fragmentation
  5. Top tips for effective database managements
Advertisements