Sunday, November 23, 2008

Database benchmark - OLAP database with or withouth primary keys

At work we are implementing new database engine for our warehouse and we also want to redesign our db structure, methodlogies etc. Currently we do not define primary keys and foreign keys constraints in our tables. The main reason is speed and problems when data is loaded into tables using DTS (when there is a single foreign key violation DTS fails, which is nightmare when you load millions of rows). I was given task to evaluate the impact of primary keys and foreign keys on our ETL process. In this article I will focus on speed - does primary keys and foreign constraints slows down the ETL process and how much?
I took some typical loading scenarios in our company and try to load tables without keys and tables with keys. Then I compared times of both. So here are the results:
  • Test 1: small star scheme
    Number of tables: 4
    Number of relationships: 3
    Number of rows: around 100 000
    Units in graph: seconds

    Conclusion: Insert is 9% slower and select is the same.


  • Test 2: medium sized FACT and DIM table
    Number of tables: 2
    Number of relationships: 1
    Number of rows: around 500 000
    Units in graph: seconds
    Comment: FACT table had lots of rows and inserting data is much more demanding than checking the constraints = keys does not slow down much

    Conclusion: INSERT is a bit slower and SELECT is about 11% faster

  • Test 3: massive FACT and DIM table
    Number of tables: 2
    Number of relationships: 1
    Number of rows: around 5 000 000
    Units in graph: seconds for SELECT and minutes for INSERT
    Comment: FACT table had lots of rows and inserting data is much more demanding than checking the constraints = keys does not slow down much

    Conclusion: INSERT is a bit slower and SELECT is about 11% faster

  • Test 4: snowflake scheme
    Number of tables: 8
    Number of relationships: 7
    Number of rows: around 600 000
    Units in graph: seconds

    Conclusion: INSERT is slowed down by 28%
Conclusion: Proper definition of PK and FK constraints will slow down inserts (on average by 15%) and will not speed up selects significantly. It will also make the ETL more demanding since the data must be perfectly clean and consistent (no foreign key violations). However you will gain a lot - 100% consistent data.

No comments: