Data Integration project (ETL)
Introduction
Need to develop an ETL Data bridge between Source Application Database (SQL Server Transaction DB) to the Target Application database (Oracle Transaction DB). The Data Bridge will perform the following
a) Publish data from the Source system as a file in XML format.
b) Parse the published XML and transform into a format acceptable by the target system.
c) Load the transformed data into corresponding tables of the Target Database.
The ETL process will need to be implemented as a schedulable batch interface with Logging and Error reporting capabilities. The ETL Process will have to Scalable, Robust and easily extendable to add more attributes in future.
Scope
a) Study the Source system and come up with the XML format for publishing the data (the actual publishing will be done by another service provider). This will involve understanding the different entities in the source system (Example: Person) and their attributes (Example: Name, Age Sex etc) and coming up with a format in which each entity will be published.
b) Study the Target System Oracle Database. Understand the tables and their relations. (Documentation of the target database is not available, the Application Screen flow will have to be traced to locate where the data is getting stored). This will involve understanding how each entity is stored in different tables , for example how the person entity is stored , how his email is stored , how his address is stored .Also understanding the relation between the person , email and address tables.
c) Arrive at Data mapping Document (Source Attribute to Target Database table. column mapping)
d) Arrive at Data Transformation Rules for each attribute
e) Build API layer (Java/ PLSQL ) which will be used to Load data to the Target database . The api's will be entity levels and will accept the attributes of the entity as input and then fire the required DML to store the data into the corresponding tables.
f) Build the Transform layer which will parse the XML data published and apply transformation and invoke the API's created in step-e. (Java)
g) Wrap everything in a schedulable job with Error Logging and Reporting Capabilities (Shell or Java)
h) Test, Deploy and Subsequent Maintenance & Support for a period of 1 month
Estimated Timeframe : 2 months + 1 Month Production support after deployment
Estimated Cost : 20,000 $
Technologies Involved : SQL Server , Oracle , JAVA , PLSQL , KSH , CRON