Database Answers  
Home About Us Ask a Question Careers Clients Contact Us Data Models Tutorials Site Map
Topics and FAQs    
  Topics of current Interest
 
This purpose of this page is to provide definitions of some important Topics of current Interest, such as Master Data Management.


TOPICS
Agile Database Techniques Master Data Management
Cloud Computing Mashups
Data Marts Primary Keys
Data Modelling Semantic Web
Data Services  
Data Virtualization  
Database Administration  
Industry-Specific Data Models  
  1. Agile Database Techniques Agile Development is an approach that emphases close collaboration with End-Users, and incremental delivery on short time-scale, typically about six weeks. The recommended reference here is a book entitled “Agile Database Techniques” by Scott Ambler, a highly-respected writer in the general field of Agile Development.
  2. BPMN BPMN stands for Business Process Modelling Notation.
    Wikipedia says "Business Process Modelling Notation (BPMN) is a graphical representation for specifying business processes in a workflow" and is currently maintained by the Object Management Group.
  3. Calendar Tables I use Calendar Tables all the time in Data Marts, and here's an interesting article on SQL Server Central.
  4. Cloud Computing Cloud Computing can be thought of as Computing over the Internet. In other words, the Internet is used to deliver results and all your data is stored off your premises somewhere remote.
    This removes the need for you to be concerned with infrastructure and a whole host of other topics that can consume valuable man-hours. time and money. Here is
    Wilkipedia's definition
  5. Data Marts Being able to produce Reports from Databases is a facility that justifies the existence of many Database. Users frequently have flexible Report Requirements which change from time to time. A very powerful technique to meet these Requirements is the approach of building Data Marts. These can be informally described as ‘throwing all the data into a huge basket and providing flexible Report Templates’. Data Marts usually have a clearly defined scope and a clearly defined User community. Data Warehouses, in contrast, are intended to be ‘a solution to everybody’s Report Requirements’. In other words, Data Marts have a smaller scope and are easier to cost-justify. Some examples are shown on the Database Answers Web Site.
  6. Data Modelling Data Modelling is a core discipline that provides a solid foundation for Data Analysis and comprehensive Data Management. The market leader in Data Modelling Tools is ERWin from Computer Associates. My favourite is Dezign from Datanamic. Both of these can generate SQL Scripts to create Tables for a wide variety of Databases, including Oracle and SQL Server, IBM’s DB2, mySQL and so on. Data Modelling therefore offers an excellent path away from vendor-specific thinking to a higher level of abstraction. A Tutorial on Data Modelling is available on the Database Answers Web Site. It is also available in Portuguese and Spanish During the Data Analysis Stage, Data Modelling can be very effective as a vehicle for establishing communication with a range of interested parties, including End-Users, Managers, Developers and so on. During Data Migration, Data Modelling is essential because it provides the details necessary to define the field-level Mapping from Sources to Targets. For Data Architecture, Modelling can be used to create simple Architecture diagrams, with easy expansion to the underling Entities. From time to time, efforts have been made to extend the scope of Modelling techniques to include more of the System Development Lifecycle. These have met with mixed success. Therefore, although Data Modelling remains a powerful and widely-used technique, Systems Modelling is not common practice except in the area of Object-Orientated Design, where the Rational Unified Process (RUP) is common practice.
  7. Data Services I think of Data Services as Web Services which are specifically designed to move data around a Service-Oriented Architecture (SOA). There is a Wikipedia definition which is full of intriguing hints, and one of the links which is worth following up is to an interesting page from Laura Gibbons, who is a Senior Business Intelligence Solutions Architect, and self-titled ‘Sassy Data Chic’. Microsoft has published some early draft material for SQL Server which is worth a look - Modelling Services Architecture and Modeling Services Design Patterns. The University of Sussex in the UK has posted a Design Pattern for Data Integration via Web Services.
  8. Data Virtualization Data Virtualization is a hot topic right now (June, 2010) and here's a reasonable definition :- "Data Virtualization is the integration of different sources of information through a logical data layer". Our context is shown as the Generic Data Platform in the diagram on this page. We recommend Wikipedia's introduction, which lists a number of vendors. Data Virtualization is the integration of different sources of information through a logical data layer". Our context is shown as the Generic Data Platform in the diagram on this page.
  9. Data Warehouses Data Warehouses, in contrast to Data Marts, are intended to be ‘a solution to everybody’s Report Requirements’. They require a corporatew commitment and budget and may not have any committed Users until late in the development process. Data Marts (q.v.) have a smaller scope and are easier to cost-justify.
  10. Database Administration A Database Administrator or ‘DBA’ is responsible for planning, installing and maintaining Databases. Being a DBA, particularly for production Databases, can be a very challenging and rewarding role, but it is not for the faint-hearted. The current trend is towards ‘Point and Click’. This means that many functions that used to require a DBA to learn complex command-line instructions have now been deskilled. This trend has been accelerated by Microsoft with SQL Server where they have exploited their expertise in Windows Interfaces. Therefore, a Microsoft SQL Server DBA can look after more Databases than an Oracle DBA. This means that it is important for a DBA to acquire skills in SQL Server’s T-SQL and Oracle’s PL/SQL. The move in the marketplace is away from Oracle to SQL Server. This is especially encouraged by the Global credit crunch, where Oracle might lose out to SQL Server on grounds of cost
  11. Design Patterns Design Patterns are generic solutions to a particular group of problems. In the case of Data Models, we can easily identify that a Design Pattern would be a generic Data Model for Master / Detail situations. For example, Customers and Service Requests or Voters and Elections. On the Database Answers Web Site, there are several Design Patterns, which are referred to as Generic Models. An excellent one to start with is this one which provides a Generic Foundation for a family of related Data Models. You should also check out SOA Patterns, and here's an interesting Book on SOA Patterns SOA Symposuium
  12. Global Reporting Global Reporting presents a range of demands for Data Integration and Performance Reporting. The Databases must provide multiple languages in a range of aspects, such as product descriptions, messages for Users in the User Interfaces. In order to avoid major redesign o the databases, these need to be built-in from the start, rather than bolted on later.
  13. Handling Hierarchies in a Database Hierarchies are very common in the real world. The two most common examples are Employees Hierarchies and Organisational Hierarchies. These are shown in this simple Data Model on the Database Answers Web Site. Using ANSI-standard SQL it is relatively straightforward to handle Hierarchies by joining a table to itself and using the ‘parent’ concept. Joe Celko is a very well-respected writer on SQL, and Joe has written a book devoted to the topic of Trees and Hierarchies in SQL His book scores 3.5 Stars on Amazon and is definitely worth a look. Microsoft’s SQL Server provides a feature called a Hierarchy ID Datatype, which is described in this MSDN Magazine Article and this Technet Library Article Oracle provides CONNECT BY, which is very easy to use and very powerful.
  14. Industry-Specific Data Models Industry-Standard Data Models are extremely useful because they facilitate Shared Services and development by Vendors and data exchange between interested parties, such as Industry Associations. Wikipedia contains some useful background entries for • A Generic Justice XML Data Model “for the exchange of information within the justice and public safety communities”. • A Standard Data ModelA Standard User Model At Database Answers, we have defined some generic Data Models for specific Industries or ‘Vertical Applications’. These are documented in this Section in the form of Entity-Relationship Diagrams (ERDs) and also Data Marts, which are particularly useful for exchanging data at summary level. These Data Models cover the following areas :- 1) Customer Services and Local Government 2) Entertainment 3) Law Enforcement 4) Students 5) Web Site Analytics The ERDs are included in this Section, and the Data Marts are in Section A.6. There is a small number of standard Data Models that have been proposed by other organizations. Two of these are IBM and Oracle. 1) IBM offers a Banking Data Warehouse and an n Insurance Application Architecture (IAA). 2) Oracle offers a Retail Data Model.
  15. Master Data Management Master Data Management, or MDM, can be defined as 'A Single View of the important Things of Interest' in an organisation. These will typically include Customers, Products and Suppliers. The MDM Approach is commonly used when data from a number of different sources is being integrated. When there are many different definitions of a Customer, it is necessary to consolidate them and to match the same Customer across different Sources. Microsoft considers this are so important that they acquired a company called Stratature, and integrated the Stratature product into their own MDM offering Here’s an excellent article by Roger Wolter and Kirk Haseldon of Microsoft discussing the “What, Why and How” of Master Data Management.
    We attended a Conference on MDM in London on April 19th to 21st..
    One of the Vendors exhibiting and speaking was Oracle.
  16. Mashups We can define Enterprise Mashups simply as a Web 2.0 technique that allows data to be integrated by using APIs that deliver the data to the page or screen. This greatly simplifies the tasks involved, compared to the conventional techniques of data integration closer to the database sources. How can I check if anyone is using Mashups in my organisation ? This question is important because it provides a measure of unsatisfied demand within your organization for new reports and analysis of data. The best way to address this question is to use the Intranet to build a Knowledge Management ‘Community of Professionals’ for Mashups. Wikipedia provides a useful introduction to Enterprise Mashups and defines Enterprise Mashups as :- “A Web application that combines data from one or more sources … thereby creating a new and distinct Web service that was not originally provided by either source. A Mashup [is] frequently done by APIs, [for example] data from Google Maps combined with location real estate data.” Wikipedia has an interesting entry on Wikipedia on Google Mashuop Editors Best Practice Best Practice will discuss how to prepare a Business Case and create Mashups Mashups are most frequently created by Developers, but they are increasingly being produced using Web Sites like Boomi, that offer high-level Tools that can be used by non-professional developers. Mashups are frequently associated with Widgets that are placed on PC Pages or Mobile screens to run the Mashup. Templates Templates cover specification of User Requirements and compliance with Data Governance. Tools Tools for creating Mashups are provided by a number of commercial Suppliers, such as :- Tutorials This simple Tutorial which provides an insight contains the following steps :- 1) Identify the Data Sources, such as RSS Feeds or Google Maps 2) Obtain the APIs for each Source 3) Map the API data items to a common data structure. 4) Using the appropriate programming language such as Apex, code the User Interface (UI) 5) Finally, behind the UI, code the integration of the data. Additional Tutorials will be derived from a review of Mashup Data Architectures. Microsoft’s highly respected Scott Guthrie mentions a Mashup Tutorial in his Blog Microsoft has an excellent page on Enterprise Mashups in their Architecture Journal, which says “… all mashups are RESTful in nature (they conform to the Representational State Transfer principles)” and contains this very helpful diagram of the Architecture of a typical Mashup :- It's worth checking out the highly recommended Microsoft Architecture Center. Question : How do I Demonstrate the potential value of Mashups ? The best way to demonstrate the potential is to produce something quickly that meets a genuine business requirement. The best way to identify the requirement is to engage with Stakeholders to determine what kind of data they need that they are not currently getting, especially data that comes from more than one source. Qualities for Success in creating Mashups To be successful in creating Mashups it is necessary to possess a number of qualities :- To have some level of experience as a developer to like to work with user people to enjoy seeing results produced quickly that meet a clearly defined User requirement. to enjoy keeping up-to-date with innovations in fast-moving technology to be able to recognize innovations that are important and have lasting power.
  17. Microsoft Unified Dimension Model With the Unified Dimension Model, Microsoft offers a Generic Data Mart to provide a common front-end for Business Users and other interested parties to share a common perspective on data for Business Intelligence (BI) and Performance reports. The Microsoft TechNet Library has a useful Weblink.
  18. Mobile Data Platforms and Applications Mobile applications are increasing being used for serious commercial and business uses, such as data entry of observations in the environment. The implications are replication and synchronization of remote Databases running on Cellphones and Mobile phones, typically running Windows Mobile with SQL Server CE. Business Intelligence is also being adapted for mobile working. There are three broad categories :- * Mobile Alerts – eg Inventory is too low * Mobile Reports – eg Total Employee Headcount today * Mobile Dashboards – eg provide broad summary of performance for senior management All of these categories benefit from the specification of a Mobile Data Platform which provides a consistent view of data that is being delivered by mobile applications. Commercial products in this area include mobiScaler from Vaultus, (who work with Salesforce.com) using the Vaultus Enterprise AppGenerator.
  19. Multiple Languages in a Database When Users need to logon to a system, the Database will store a preferred language field for every User. UNICODE is used to correctly store different characters sets, such as Japanese, French or Hungarian Multiple languages are commonly used for Multi-national Corporations, such as Microsoft, where they need to store the description of the same Product in many different languages. In this case there would be a language field in every table that stores text that must be displayed in the language suitable for the country (or the preferred language for a User). There may not be any text of this nature that must be displayed in different languages. There may be messages that the application will display to the User. In other words, they exist in the interface between the application and the User. For example, the Japanese characters for "Welcome" would be displayed as ????.
  20. Open Source Databases The Open Source approach offers free software with support available at a charge. The most widely-used Open Source Database is mySQL, and here’s their Web Site http://dev.mysql.com/ A while ago, MySQL was taken over by Sun, which was recently taken over by Oracle, (in place of IBM). It will be interesting to see how the future of MySQL evolves.
  21. Oracle EXTERNAL Tables Oracle offers a neat facility to read CSV Files as External Tables. One important point to keep in mind is that you must put the CSV Files on the Oracle Server. In other words, where Oracle can read them. Simply putting them on your PC (eg C Drive) won't do . Here is a Useful Link on OraFAQs.
  22. Primary Keys A Primary Key is a Record Identifier that identifies each record in a Database Table uniquely. For example, a record for a Person will have a Person ID as the Primary Key. There are three options, when choosing a Primary Key - a Natural Key, an Artificial Key or a Surrogate Key. 1. Natural Keys Candidates for this kind of Key occur naturally in the real world and the data being studied in any situation. They are very rare in practice and are therefore very rarely used. The approach of using a combination of columns to define a natural key can be used more widely. However, this approach introduces major housekeeping problems. 1.1 Examples 1) A Calendar, where the date and time provide a natural key. 2) Colours - Names of primary Colours – eg Red, Yellow and so on. 3) A Country name 4) A City name, qualified as necessary by County/State, Country. 5) An Email Address might qualify as a unique identifier for a person or organisation. Here’s a good place to say “Class, please discuss“ 6) A Language name – might need qualification, eg British English or American English 7) A pair of longitude and latitude values to identify a geographic location or a point on the map. 8) The natural key for a person could be a combination of Full Name, Date of Birth and Place of Birth. Unfortunately this means that the Primary Key consists of these three columns, which all have to be repeated if they appear as a Foreign Key on other tables. Changes to the name typically occur when a woman gets married, and these must also be handled. Candidates for Primary Keys can be found in data which occurs in the real world, but which is created by people or systems. Examples are discussed in the next Section under ‘Artificial Keys’. 2. Artificial Keys Artificial Keys are man-made codes, such as an ISBN for books, a Vehicle Registration Number for a car or a National Insurance number for people. In theory, these values are unique and are therefore excellent candidates as natural keys. In practice, they are never unique because of administrative problems or human error, either accidental or intentional. Therefore they cannot safely be used as natural ‘artificial’ primary keys. 2.1 Examples 1) An ISBN for a Book 2) A National Insurance number for people (not everybody has one) 3) A Vehicle Registration Number for a car (this changes) 3. Surrogate Keys Surrogate Keys are generated by the DBMS and the users never see them. They are part of the Database ‘housekeeping’ and remain internal to the Database application. 3.1 Examples 1) A Customer ID 2) A Trader ID
  23. Products and Prices The design of a database will always reflect the business requirement. In the specific case of the price of a product, this can frequently be flexible and change to reflect supply and demand. For example, you can negotiate the price of a hotel room or a plane flight. Therefore the design of the database must accommodate the requirement to store a standard price and then the actual discounted price. This is usually stored with the Order Item details. I recommend this basic Customers and Products Data Model as a good starting-point. Then add a field called actual_price_paid to the Order_Items table. This Hotel Researvations Data Model shows a good example variable pricing in the Daily_Room_Rates table.
  24. Reference Data Reference Data is an important part of any Database. Its importance frequently goes unrecognized. The specification and sourcing of Reference Data should be one of the early tasks in creating an operational Database. Where possible, it should conform to National or International Standards. For example, Currency Codes and Currency Codes. Reference Data is an important part of Master Data Management, which requires a single, consistent definition of ‘Things of Interest’ in a Database. An exxcellent reference is our sister Master-as-a-Service Web Site.
  25. SBVR The Wikipedia entry says "SBVR stands for the Semantics of Business Vocabulary and Business Rules is an adopted standard of the Object Management Group (OMG) intended to be the basis for formal and detailed natural language declarative description of a complex entity, such as a business".
    SBVR is related to BPMN(qv) and both of these approaches have been addressed by a product called Collibra.
  26. Semantic Web We have a page devoted to the Semantic Web.
  27. Service-Oriented Architecture (SOA) The Service-Oriented Architecture ‘SOA’ approach is important and will become increasingly important in the future. It is based on providing data processing functions as a series of Services, which can be Web Services. It can include an Enterprise Service Bus. It is an essential design approach for Cloud Computing because it makes it possible to think about Data Services as independent functions that can be executed as required in an environment that is loosely coupled around an SOA and an Enterprise Service Bus. Microsoft’s position on SOA is worth reading, and they also have a Self-Assessment Tool to help organizations identify where they are in SOA and Business Process capabilities. You should also check out SOA Patterns, an interesting Book on SOA Patterns and the SOA Symposuium. A particularly helpful Pattern is for User Interfaces (UI). Finally, Gil Fink has posted a Blog suggesting a great idea to generate Templates for the Microsoft Entity Framework.
  28. Sharepoint Sharepoint is a very powerful and very popular Microsoft toolset for creating Intranets. It is particularly appropriate for global organisations who want to share documents with access control for different groups of people within the organization. It can also be used for creating and publishing Internet Web Sites. Sharepoint includes a Business Data Catalog which can be used to integrate information from a variety of Databases, including Oracle. It also provides Business Intelligence in the form of Key Performance Indicators,(‘KPIs’). It can therefore be used to implement the Database Answers Road Map for Enterprise Data Management. A good starting-point is Here’s a link to Microsoft’s home page for Sharepoint ,and Wikipedia has a useful entry.
  29. Single Version of the Truth Wikipedia describes a Single Version of the Truth in the following terms :- "... a technical concept describing the data warehousing ideal which stores all of an organisation's data in a consistent and non-redundant form." IBM offers the WebSphere Commerce. Customer Data Integration. Microsoft has defined a Single View Platform for Government. It describes it as "a collaboration and data visualization platform that provides government agencies with solutions for one comprehensive geospatial view of critical data.
    Microsoft SVP provides a single, geographic view of complex information and data sets across multiple roles, locations and user interfaces, which can
    vastly improve agency communication, collaboration and decision-making, and significantly increase the success of government missions."
  30. SQL and the System Date A frequent requirement is to pick up the current date in an SQL statement. This is done by SELECT 'The date stored in the System'. This is not covered by ANSI SQL, but each vendor has implemented their own variation of a common theme. For IBM's DB2 use CURRENT DATE (no underscore) :- SELECT CURRENT DATE FROM SYSIBM.SYSDUMMY1 You can also use CURRENT TIME or CURRENT TIMESTAMP depending on your needs. For MySQL, the syntax is :- SELECT CURRENT_DATE For Oracle, which provides a dummy table called Dual, and the Oracle Syntax for SYSDATE looks like this :- SELECT TO_CHAR (SYSDATE, 'MM-DD-YYYY HH24:MI:SS') "NOW" FROM DUAL; NOW ------------------- 04-13-2001 09:45:51 For SQL Server, the equivalent of Oracle's SYSDATE is the GETDATE() date function, for example :- SELECT GETDATE() AS [SYSDATE] GETDATE() returns the current system date and time in the Microsoft® SQL Server™ standard internal format for datetime values. Date functions can be used in the SELECT statement select list or in the WHERE clause of a query.
  31. Trusted Information Trusted Information is a neat phrase that is achieving some currency these days. It refers to the need for information that appears in mission-critical Performance reports to be of good quality, correct, consistent and trustworthy. It implies Data Governance, Data Quality and a published set of Business Rules for Data Clean-Up. IBM has given exposure to Trusted Information as part of their Information Agenda Concept The phrase ‘Trusted Information’ appears on page 9 of the document entitled “Accelerating an information agenda with IBM InfoSphere Foundation Tools”. Microsoft has also given thought to a Trusted Information Class and a Trusted Domain Object Data Model The slightly more focussed expression “Trusted Intelligence” appears on the Global Data Excellence Web Site
  32. XML Database Schema The ability to export Data Schema or MetaData in industry-standard XML is a powerful facility that should be included in the criteria you follow when choosing the right Data Modelling Tool to meet your requirements. This would provide a facility to export Database Schemas into a Data Dictionary or Information Catalogue. In other words, this very useful facility would be one that exported records of tables and fields from a Table into a CSV file which could then be imported into a Data Dictionary Table. Of course, even better would be an ODBC or Web Service interface to the System Catalogues in the Data Modelling Tool. My favourite Data Modelling Tool is Dezign from Datanamic because it produces diagrams that I like to look at, it has excellent features and it is very affordable. Dezign stores Data Models in a .dez file, which is itself is an xml file. This could probably be transformed into a csv file with XSL transformations. Here is an article contributed by Robert Allen of SlaphappyGeeks.com, which provides an excellent introduction to the subject. And this one shows how it can be used in practice.

Home About Us Ask a Question Careers Clients Contact Us Data Models Tutorials Site Map