Database Answers  
You are not logged in: login | why register ?
Home Ask a Question Careers Clients Community Data Models Search Site Map Contact Us
Data Governance
BI+Performance Rpts
Data Marts
Data Integration
Data Sources
Info Catalogue


Architecture
 
    If you have a Question that is not addressed here, please feel free to email us your Question.
    Question 1: How do we measure progress in Data Integration ?
    Look for the existence of the following items :-
    
    • Generic Data Models
    • An Enterprise Data Platform
    • Identify the Data Sources
    • Selection of a MDM Product
    • Implementation of a Customer Master Index or appropriate alternative
    Question 2: How do I teach Young People Data Modelling ? Database Answers has a Tutorial on Data Modelling for Young People Question 3: What is the difference between Conceptual, Logical and Physical Data Models ? Wikipedia has some useful entries on Conceptual Models, Logical Models and Data Models. Conceptual Data Models do not conventionally show Foreign Keys and are very useful for making clear the Entities and Relationships in a Data Model without any Keys or Attributes. They are very useful for discussing Requirements with Users because they show only the basics. Logical Data Models add Foreign Keys and Attributes. They are very useful for publishing a complete statement of the data involved. Physical Data Models are very close to the Database design. They are very useful for discussions between the Data Analyst, DBAs and developers. Question 4: What does ETL stand for ? Wikipedia has an entry on ETL which is worth a look. ETL stands for Extract, Transform and Load. • Extract means Extracting data from Data Sources. • Transform covers many tasks, including – o Selection of the data of interest o Validation and clean-up of the selected data o Changing the format and content of the data o Loading into the designated Target. In practice, there are three options for implementing ETL:- • Develop bespoke SQL • Use a commercial package, such as Informatica or Microsoft’s Integration Services • Some combination of these two. For example, developing basic SQL to clarify the Requirements and then looking for a commercial product to meet the Requirements. Question 5: How do I establish a Strategy for Data Quality as an Enterprise Issue ? A successful Strategy must include both organization and technical aspects. Typical Organization aspects are :- • Commitment from senior management • Establishing the slogan “Data Quality is an Enterprise Issue” as a top-down edict. • Identification of the ‘Top 20’ Applications and Data Owners across the Enterprise • Agree sign-off procedures with Data Owners and Users Technical aspects • Establish Key Quality Indicators (‘KQIs), for example Duplicate Customers records • Agree target Data Quality percentage • Define KQI Reports and dashboards • Develop SQL to measure KQIs • Define procedures to improve KQIs Question 6: How do I handle multiple types of Database, such as SQL Server and Oracle ? The key to handling multiple types of Database is to thing of them in terms of an Integrated Data Platforms, where all types of data are presented in a common fashion. This then defines the logical requirement. There is a then a number of options to physically meet this logical requirement. The Enterprise-level option is to use an appropriate commercial product, such as Informatica Question 7: How do I obtain a Single View of the Customer ? This requires a method of matching Customers, de-duplication and the maintenance of a Customer Master Index, (‘CMI’) supports a Single View of a Customer. When there are many sources of similar data, such as Customers, there are frequently duplicate records. For example, in the US, John Doe could be also called Jon Doe, Johnny Doe, Mr.J..Doe and so on. In the UK, Joe Bloggs could also be called Joseph Bloggs, Joey Bloggs, Mr.J.Bloggs and so on. The rules for recognizing and resolving this kind of problem has led to the development of software for De-duplicating records. This process is informally referred to as ‘de-duping’, especially by people who do a great deal of it. Best Practice is to look for a commercial product, rather than to write your own bespoke software because it usually takes longer than expected and commercial products can be quite cheap. This page on the Database Answers Web Site is an excellent starting-point for reviewing "De-duping" Question 8: What is Data Lineage ? Data Lineage can be defined as being able to the trace the derivation of all items of data that appear in any important Performance Reports and Management Information. That includes :- • Who owns the original source data • What validation and transformations are applied to the data in its life cycle Question 9: How do I verify the derivation of all data (the Data Lineage’) in the Reports ? This requires the use of an Information Catalogue that will record the source, processing steps and final delivery in a Report. The transformations in the processing Steps must be specified in both the processing language, for example SQL, and in plain, unambiguous English so that the Data Owner can sign-off on the Data Lineage. Question 10: What is Master Data Management (MDM) ? One of the major components in Master Data Management (‘MDM’) is Customers. MDM can be defined a ‘Providing a Single View of the Things of Importance within an organisation’ Master Data Management applies the same principles to all the ‘Things of Interest’ in an organisation. This can typically include Employees, Products and Suppliers. We have discussed ‘A Single View of the Customer’ and MDM involves the same kind of operations as a CMI. That is, identification and removal of duplicates, and putting in place to eliminate duplicates in any new data loaded into the Databases. There is a wide choice of software vendors offering MDM products. De-duplication and Address validation is a niche market in this area. On the Database Answers Web Site, there is a Tutorial on Getting Started in MDM There is a sister Web Site devoted to the topic of MDM-As-a-Service
 

© Database Answers Ltd. 2010

Home Ask a Question Careers Clients Community Data Models Search Site Map Contact Us