Home > Developers > Wiki > Example Data Mart
Views

This page is under construction

We want to be sure that the description of the sample datamart is as complete as possible!

In v1.1, the Mifos application will support reports that are

  • built directly from the Mifos database repository using SQL, and
  • built by accessing the Mifos business layer using BIRT scripted data sources.

This approach has several drawbacks:

  • Report designs are tightly coupled to operational database schema. When that schema changes for operational reasons, existing reports must be rebuilt.
  • Using scripted data sources insulates the reports from changes to the database. However, creating reports may impact the overall performance of MifOS as the process contends with MifOS for resources.

After v1.1 our goal is to provide a datamart that will make building new reports

  • easier and less error-prone -- the datamart's star-schema are optimized for reporting and tend to more intuitively reflect the business. This makes SQL queries far simpler than those that access fully normalized schema that are optimized for daily operations.
  • not require Java skills and deep knowledge of MifOS's object model in order to design scripted data sources
  • less subject to underlying schema changes required for operational efficiency
  • not have to contend with MifOS operational resources -- the tasks to populate datamarts and build reports from them can occur in a separate environment.

To that end, the version 1.1 relase comes with a small sample datamart and a collection of batch jobs that populate it. The jobs were designed with the open-source tool Talend Open Studio. The tool generates Java source that can be executed within an application server like Tomcat or run directly from the command line. You can find on-line documentation on Talend at http://www.talend.com/resources/documentation.php and a helpful tutorial at http://www.talendforge.org/tutorials/menu.php .

This should give developers an introduction to what we plan to build going forward. Consider this a work in progress for evaluation, but not yet production-ready. Your feedback is appreciated. Please post comments on MifOS's forums.

The MifOS team thanks Ravi Kutaphale, IBM-Dublin, for this important contribution to the MifOS project.

Version 1.1 contents

MifOS? release 1.1 includes this work in progress:

  • database schema for the sample datamart, which currently contains client dimensions, office dimensions, a time dimension, and fact tables for client status and client loan accounts.
  • Talend-generated source code for batch jobs that populate the datamart's dimensional and fact tables.

Datamart design

We have used dimensional modeling, a technique that seeks to

present the data in a standard framework that is intuitive and allows for high-performance access. In this model, a datamart consists of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic star-like structure is often called a star join. (R. Kimball, L. Reeves, M. Ross, W. Thornwaite. The Data Warehouse Lifecycle Toolkit. Wiley, 1998. )

Datamart tables

Dimension tables

Dimension tables contain information about business entities that do not change over time, or at worst are slowly changing. The function of the attributes in a dimension table are to filter, group, and label facts to be reported on.

Dimension Tables
Name Description
head_office_dim This and the following tables reflect the standard office hierarchy
regional_office_dim  
subregional_office_dim  
area_office_dim  
branch_office_dim  
center_dim Standard client hierarchy
group_dim  
client_dim  
client_loan_account_dim  
loan_officer_dim  
client_status_dim Has pre-populated values for differrnt client statuses'
time_dim Designed to hold maximum information about particular day.

Fact Tables

Fact Tables
Name Description
client_loan_fact Daily snapshot data for the client loans
client_status_fact Periodic snapshot of the status of all clients in the system
group_status_fact Periodic snapshot of the status of all groups in the system

Overview of the jobs

The sample Talend design jobs can be found in subversion at : trunk/mifos/TalendETL ( https://mifos.dev.java.net/source/browse/mifos/trunk/mifos/TalendETL/ ).

The ETL jobs read the data from the MifOS transactional schema, apply transformations to the data and persist it in the new data warehouse. Talend provides a graphical user interface to generate the ETLs. The output from Talend design is a plain JDBC client. This java code can be executed as a standalone client, and thus can be executed via cron or a similar scheduler. For editing the ETLs, Talend generates the design files which can be imported in Talend and edited accordingly.

Building the jobs

The ETLs are in the SVN repository under trunk/mifos/TalendETL This package is self sustained. After you checkout the ETLs, you can build the ETLs using the build file in TalendETL.

Simple steps to follow

  1. Checkout trunk/mifos/TalendETL.
  2. Build the jobs by running ant on the build file under TalendETL.
  3. Create the datamart using the SQL scripts in TalendETL/sql.
  4. Execution : The build compiles all the ETLs and copies all the relevent files in target folder. Edit the hibernate.properties under target/calsses/conf to point to correct transaction and datmart schemas. Run the execute_master_ETL.bat. This will execute all the ETLs in required sequence.
relationships.real.compact_small.png

ETL Job Details

ETL Jobs
Name Source Tables Destination Tables Comment
MasterETLLoader?   All executes all the ETL jobs in a proper sequence. When running the ETLs? for the first time, the master ETL is the best option. It runs all the ETLs populating the parent tables and then the child tables.
RegionalOfficeLoader office head_office_dim, regional_office_dim the regional office is linked to head office by a foreign key, so it is necessary to first populate the head_office_dim and then the regional_office_dim. The correlation between the office table (transaction schema)and various office dimensions is done via "global_office_num".
SubregionalOfficeLoader office subregional_office_dim  
AreaOfficeLoader office area_office_dim  
BranchOfficeLoader office branch_office_dim  
ClientLoanOfficeLoader personnel, customer loan_officer_dim  
CenterDimensionLoader customer center_dim  
GroupDimensionLoader customer group_dim  
ClientDimensionLoader customer client_dim  
ClientLoanAccountLoader customer, account, loan_account client_loan_account_dim  
ClientStateFactLoader customer, customer_state, customer_state_flag, customer_flag_detail client_state_fact  
ClientLoanFactLoader account, loan_account, loan_arrears_aging client_loan_fact  
GroupStateFactLoader customer, customer_state, customer_state_flag, customer_flag_detail group_state_fact  
TimeDimension   time_dim  
DailyTimeDimensionLoader   time_dim The ETL to populate the time_dim dimension with the current date. Not all the attributes in the time dimension are populated by this ETL.
populate_client_status.sql   client_status_dim  
populate_group_status.sql   group_status_dim  


subtopics:

... --ecable, Wed, 09 Apr 2008 10:42:38 -0700 reply

fixed some of the wiki-formatting


Grameen logo