ppetr's blog

Sunday, August 29, 2010

wallpapers from Germany

Recently I have visited some places in Germany so I made some wallpapers - most of them are from Berlin (Berlin wall) but some from car museum from all around the country.

Friday, August 27, 2010

wallpapers from Russia and Ukraine

I took some nice pictures during my roadtrip around Russia and Ukraine. I chose few to create some fine wallpapers. Just click on the picture and it will take you to Picasa album when you can get the HD version. Few pictures are from Chernobyl (Pripayt city); the rest is from Moscow and St. Petersburg.

erwin data modeller tutorial - reverse engineering tool

ERwin modeller has a reverse engineering functionality implemented. Reverse engineering is nothing else than creating a model of existing database. If you already have some tables in the database but you have not modelled them first you might want to use reverse engineer to have to model automatically created for you. The fact is that many business simply do not model and create the tables directly in the database. This is obviously malicious practice because when creating a model first you have enough time to thing about the design (you see the desing in WYSIWIG editor) which obviously leads to better DB design. If you create tables directly in the DB you don’t see how it actually look and effectively bypass the design phase. Model is also form of documentation and having documentation is always a good thing. Documentation can be easily shared and other developers can see what is already in the database and maybe reuse the data which leads to lower data redundancy. Lastly when using ERwin you can create logical model which is DB independent and then you create different physical models for different DBs. This is particularly useful when you migrate from one DB vendor to another (preventing vendor lock in) or if you have heterogeneous DB environment.

So how does the reverse engineer works? You simply tell ERwin how to connect to the database you want to reverse engineer and select the type of objects you want to reverse engineer. Erwin then does everything for you and creates automatically a logical/physical model. So the first step is to select Reverse Engineer from Tools. In the following diagram you select if you require Physical or Logical/Physical model. Better practise is to have Logical model as well. Also select the DB engine you reverse engineer from. The next thing to do is to select objects you want to reverse engineer. It depends on your modelling strategy but I recommend to select as few objects as possible. You don’t want your model to be overly complicated. You should however tick following: tables and views. Click next and in the last step you just need to fill in all the credentials – tell Erwin to where and how is he supposed to create the connection. Just select the type of authentication Windows or SQL Authenticaiton, fill in the user (in case of SQL Authentication) and fill the server and database name. Once this is done click connect and ERwin will reverse engineer database you have selected and create model like the one below.

Reverse engineering is an effective way of creating a model from an existing database. Since some developer might bypass the design phase (this should not be encouraged!) the reverse engineering tool might be useful. The only limitation is that you cannot select the tables you want to reverse engineer – it will always pull meta data about all the tables in the database you have selected. This is particularly annoying if you connect to a massive database like (example: datawarehouse) – it will take ages to reverse engineering all the tables.

Complete compare

Another useful tool is called complete compare. It allows you to compare different models you have created and detect changes. It is particularly useful tool when you make some change to the model for which physical implementation in DB already exists. You can then clearly see what the differences are and do some impact analysis before you actually deploy the changes to the DB. To do complete compare just select Complete Compare from the Tools menu. Then you just select left and right model (two models to be compared), the types of objects (tables, views etc.) you want to compare and finally the objects you want to compare (you can select just some tables to be part of the comparison etc.). Then you compare the model and a comparison dialog containing all the detected differences will pop up and you can decided which differences should be carried over and which not. This whole process is fairly simple and straightforward.

However there is another use of complete compare – to merge meta data from two different sources. Just imagine a situation that you have a model of a database (tables) but some meta data is stored somewhere else – in some other modelling tool and you would like to have all the meta data in one model in ERwin. So what to do? Simply use the import wizard of ERwin which allows you to import models from different tools and compare the models you have imported. Using complete compare you can merge meta data from one model with meta data from other model. Real world example can be from area of business intelligence. One of my customers had a datawarehouses without keys defined (they had reasons for that) and Business Objects universes containing joins and keys. They wanted to have a data model containing both the meta data about the tables form the data warehouse as well as the joins and some other meta data from the Business Objects universes. The process of merging is fairly simple. First step is to reverse engineer the database and get a create a model; after create a model import the model from the Business Objects universe (or any other data source you might have). When you have both models just complete compare them and migrate the changes from one to other. In our case we have migrated the joins from the Business Objects universe into our DW model.

In this chapter we have discussed the reverse engineer and complete compare tool. When you have a database but you have not used ERwin to design it you might want to reverse engineer the model from the database. Or if you made any changes to the database without changing the model you can use complete compare tool to find the differences. Complete compare is also useful for merging meta data from different sources.

Thursday, January 22, 2009

erwin data modeller tutorial - forward engineering tool

In last post we have designed simple model of a bank and save it either locally or to data mart. But usually what you want to do with model is to export it to the database so it can be used by you and other developers. By export I actually mean creating all tables, relationships etc. in the database. ERwin’s export dialog is really sophisticated and allows you to properly set what you want to export. If you have not chosen target db engine do it now clicking on Database/Choose Database. If you have chosen your target db engine you are ready to export your model. Click on Tools select Forward Engineer and then Scheme generation. This dialog should pop up:

This dialog allows you to fully set up export. Here is description of main fields:

Schema Generation file – any configuration you set up can be saved into XML file and used later. This is extremely useful when you have one file for creating tables, relationships etc. and one for dropping them. Thing is when you make changes to the model you have to recreate tables in the DB and in order to do so you have to drop them first (when you use PK and FK constraint you have to drop them in special order) and create them again. So just create two config export files – one for dropping and one for creating.
In Option tab there are two windows: DB Schema Generation and Schema where you actually set up what is ERwin going to do. I recommend you focus on Table and Referential integrity in left window where you can set whether and what to create or drop. I also recommend to uncheck creation of ERwin triggers.
On the bottom there are few buttons that might be interesting for you:
- Filter allows you to filter out tables you do not want to be generated
- Preview actually shows you the SQL query that will be sent to db
- Generate will connect (ask to select connection) to the database. If you have already selected connection ERwin will NOT ask you again so if you want to export your model to different db you have to change in main window in Database/Database Connection…

One important thing is that you do not make changes to design directly in the database but you do them first in ERwin and then recreate tables changed. Developers tends to skip ERwin and make some little changes directly to the DB.
Well you know what is forward engineering tool. But what if you already have some tables in database and you want to create model according to them? ERwin allows you to reverse engineer your database and export all metadata from DB and get a model. Just click Tools/Reverse engineering tool… and short wizard will guide you through the process. In this post you have learnt how to use forward engineering tool to export your model into DB and reverse engineering tool to create model according to existing DB.

Saturday, December 13, 2008

ERwin Data Modeller tutorial - Our First Model

In the last post I have described how to set up ERwin so you can store all your models into database using AllFusion ERwin Model Manager. This is very useful since it allows you to:

share models with other team members – yeap, teamwork!!!
make different versions of your model - ERwin has its own versioning tool which works fine; whenever you save current version of model old version (one before changes were made) is saved as well, so you can roll back
data mine Model Mart database and get some interesting info about the model (meta-data) you might need – I will talk about this later since it is very important

This post will be about creating your first model. I will show you how to create simple model and explain the difference between logical and physical model, identifying and non-identifying relationships. So let’s get started. Shall we?

To create new model, click on File/New and in the dialog select if you want to create Logical, Physical or Logical/Physical model. If creating physical model, choose the database you want to create it for. For those who don’t know the difference between logical and physical: logical is about the business (gather business requirements and transfer them into model), physical is purely about the db. To be honest I usually draw logical model on a piece of paper and then create physical model in Erwin.
If you have created new model, your screen should look like this:
The whole screen consists from 3 main areas – menu and toolbars, object explorer on the left and main workplace on the right.
So how do I create a model? First of all you have to create some tables (in logical model they are called entities). Just click on and create entity in the main workspace area. Give this entity name. After that you have to add some columns (in logical model they are called attributes). If you create more than one table, you can connect them with relationships. You can basically choose (by clicking on ) between two types of relationships:
- Identifying relationship – PRIMARY key from parent table migrates to the child table as a PRIMARY key, which means that the row in the child table cannot exist without row with the same PRIMARY key in parent table. I use identifying relationship only when I want to resolve many-to-many relationships using additional “key ”table.
- Non-identifying relationship – PRIMARY key from parent table migrates to the child table as a FOREIGN key, which means that the row in child table can exist without row in parent table (FOREIGN key can be NULL, PRIMARY cannot be NULL).

Scenario: Let’s create simple model. In CreditCrunch Bank Ltd. they store data about Customers and their bank Accounts. Every customer can have more than one Account and one or more customers can be assigned to one Account (for example company account can be accessed by more people). Every customer can have Personal Banker assigned. Bank is supposed to store Customer’s and Personal Banker’s name and surname. You also have to store currency and type of Account. Do not forget about proper definition of PRIMARY keys!

Solution: You have to create 4 tables: Customer, Account, PersonalBanker and CustomerAccount. The last one just resolves the many-to-many problem between Customer and Account. Others contain info about business entities (Customer, Account, Personal Banker). What about the relationships? You use identifying relationship between Customer and Customer_Account and also between Account and CustomerAccount. Why? Because when there is an Account it must have Customer assigned and also if someone is a Customer he/she must have at least one Account. One cannot exist without the other one and so you need to use identifying relationships. The last relationship between Customer and PersonalBanker is non-identifying because Customer can (but does not have to) have Personal Banker. And here is the picture:

If you have created your model you probably want to save it. You can store it either locally or in a database. To save it locally just click File and save or save as… as with any other application. What is more interesting is that you can store models in a database. Click Services and Connections. Dialog allowing you to connect to Model Mart should appear. This dialog contains connection we have created in the previous post so just select this connection and click Connect. Now you are connected to Model Mart and can save all your models into DB (click Services and Save/Save as…).

Sunday, November 23, 2008

Database benchmark - OLAP database with or withouth primary keys

At work we are implementing new database engine for our warehouse and we also want to redesign our db structure, methodlogies etc. Currently we do not define primary keys and foreign keys constraints in our tables. The main reason is speed and problems when data is loaded into tables using DTS (when there is a single foreign key violation DTS fails, which is nightmare when you load millions of rows). I was given task to evaluate the impact of primary keys and foreign keys on our ETL process. In this article I will focus on speed - does primary keys and foreign constraints slows down the ETL process and how much?
I took some typical loading scenarios in our company and try to load tables without keys and tables with keys. Then I compared times of both. So here are the results:

Test 1: small star scheme
Number of tables: 4
Number of relationships: 3
Number of rows: around 100 000
Units in graph: seconds

Conclusion: Insert is 9% slower and select is the same.

Test 2: medium sized FACT and DIM table
Number of tables: 2
Number of relationships: 1
Number of rows: around 500 000
Units in graph: seconds
Comment: FACT table had lots of rows and inserting data is much more demanding than checking the constraints = keys does not slow down much

Conclusion: INSERT is a bit slower and SELECT is about 11% faster

Test 3: massive FACT and DIM table
Number of tables: 2
Number of relationships: 1
Number of rows: around 5 000 000
Units in graph: seconds for SELECT and minutes for INSERT
Comment: FACT table had lots of rows and inserting data is much more demanding than checking the constraints = keys does not slow down much

Conclusion: INSERT is a bit slower and SELECT is about 11% faster

Test 4: snowflake scheme
Number of tables: 8
Number of relationships: 7
Number of rows: around 600 000
Units in graph: seconds

Conclusion: INSERT is slowed down by 28%

Conclusion: Proper definition of PK and FK constraints will slow down inserts (on average by 15%) and will not speed up selects significantly. It will also make the ETL more demanding since the data must be perfectly clean and consistent (no foreign key violations). However you will gain a lot - 100% consistent data.

Saturday, November 22, 2008

Job hunting - Summary

So called Graduate Development Programmes offered by many companies are basically dream jobs for fresh graduates. You have a opportunity to be involved in more teams and departments of the company, you can still develop your theoretical knowledge but you also work on real-world projects and...it is all well paid! What more to wish for, right? Only problem is to get to one of those programmes. As far as I know there are usually 30 applicants for 1 position offered. On this blog I tried to describe my experience with the selection process. This article is kind of summary - the essential stuff you should know if you want to succeed and get your dream job so it will not be as consistent as the other articles. So here is my advice:

Start early - you can start applying for the programs during late November and you can apply (usually) till end of January/February. To choose companies to apply for, filling in all online applications and taking part in the assessment center will be time consuming. Get ready for it...and start early!
Choose companies wisely (more here) - how to choose the best companies for you? Try to consider following:
- company's industry - what I did first was I selected industries I find interesting and then selected companies from these areas. So if you find banking boring (as I do) do not apply for banks. However this is very subjective and therefore not very scientific but it can help reduce the number of companies to choose from.
- Image of the company - everyone wants to work for cool company so try to involve this factor in your choice - again very subjective, but if you want to be satisfied with your job it is a lot about feelings which are subjective. You do want to be proud of a company you are working for!
- Structure of the graduate dev. programme offered - try to get some info about what you are going to do if you get the place. Is it just a work or are there possibilities to develop yourself in other areas (languages, international placements, getting certifications etc.).
- How difficult it is going to get the place - when you are selecting your companies do not focus only on the industry leaders. It is probably very difficult to get a job within such company. So do apply for jobs in "less cool" companies. It is not a good idea to apply only for Google, McKinsey & Company and IBM.

Do some research - you need to do some research before you take the telephone interviews (see here and here). The research should take at least 30 minutes for every company. What to read?
- Wikipedia entry about the company
- Company's webpage with focus on products and services
- Google out some recent stuff about the company (use news.google.com)

Do even more research before the AC (see here) - during the assessment center you will take part in several different tasks. Usually you cannot prepare for them. One of them will be interview and you will be talking about yourself. This is a great selling point but only if you are prepared. Try to link your experience/theoretical knowledge to the company's industry - but you need not know a lot about this industry, so do your research!

I hope this series of articles helped you to get a nice graduate job. For me it was very demanding but definitely worth it.

ppetr's blog

Sunday, August 29, 2010

wallpapers from Germany

Friday, August 27, 2010

wallpapers from Russia and Ukraine

erwin data modeller tutorial - reverse engineering tool

Complete compare

Thursday, January 22, 2009

erwin data modeller tutorial - forward engineering tool

Saturday, December 13, 2008

ERwin Data Modeller tutorial - Our First Model

Sunday, November 23, 2008

Database benchmark - OLAP database with or withouth primary keys

Saturday, November 22, 2008

Job hunting - Summary

Blog Archive

Labels

Google Analytics