W5 asig ETL and Data Warehousing

ETL and Data Warehousing al Affiliation ETL and Data Warehousing Select a company from the United s. Briefly describe the company’s business and its existing or planned data warehouse environment.
The company selected to comply with the requirements of the paper is American Airlines. American Airlines is reported to be one of the world’s largest carriers with historical roots traced in the 1920s (American Airlines, 2011). It is currently headquartered in Fort Worth, Texas and was noted to be “one of the largest scheduled air freight carriers in the world, providing a wide range of freight and mail services onboard Americans passenger fleet” (American Airlines, n.d., p. 1). The company’s data warehouse requirements are being served by Sybase (Sybase, 2014). Accordingly, the need for locating an appropriate ETL provider was the aim of increasing “revenue by reducing fraudulent ticket processing. That meant finding a way to quickly and efficiently query their data warehouse. That meant Sybase” (Sybase, 2014, p. 1). In addition, the data warehouse system of American Airlines through Sybase enabled the provision of the following results: “detect fraudulent ticket-processing, track ticket sales properly and ensure proper revenue is flowing into the company” (Sybase, 2014, p. 1).
Research the leading ETL products available in the marketplace and write a comparison of their major features.
The leading ETL products available in the market are diverse. The features are traditionally compared according to the following:
“√ Infrastructure √ Functionality √ Usability
√ Platforms supported √ Debugging facilities √ Data Quality / profiling
√ Performance √ Future prospects √ Re-usability
√ Scalability √ Batch vs Real-time √ Native connectivity”
(Passionned Group, 2014, p. 1)
The comparison of results is presented in a tabular format so that the individual product features are easily compared across vendors.
Version—-&gt.
5.0 InformaticaPowerCenter
5.1Ascential Data Stage XE
11.5 BODI
Architecture
Architecture
Client and Server Architecture
Client Server Architecture
Clinet Server Architecture
Scalable and Extensible Technology
Highly scalable and extensible technology. Scale up as the data and load grows. Scales up w.r.t the hardware and software
Highly scalable Scales up w.r.t the hardware and software
Highly scalable Scales up w.r.t the hardware and software
Client Platform
Windows 2000/NT/98
Windows 95/NT/2000
Windows 95/NT/2000
Server Platforms
Sun Solaris, AIX, HP-UNIX, Windows NT/2000
Windows NT ( Intel and Alpha Platforms ), UNIX AIX, HP-UX, Sun Solaris, COMPAQ Tru64. Data Stage XE 390 works on OS/390 platform.
Sun Solaris, AIX, HP-UNIX, Windows NT/2000
Which DBMS are supported for extraction and loading
For Extraction: DB/2, DB/2 /400,Flat Files, IMS, Informix, MS SQL Server, MS Access, Oracle, Sybase, UDB, VSAM, ODBC, Others
Targets: Informix DB/2 /400,MS SQL Server, MS Access, Oracle, PeopleSoft Enterprise Performance Management (EPM), SAP® Business Information Warehouse (BW), Sybase, UDB, Flat Files, Others
QSAM: Sequential flat files ISAM: VSAM: KSDS, RSDS, ESDS – support GROUPS, multi-level arrays, REDEFINES, and all PICTURE clauses. DB2, Adabas, Oracle OCI ( For releases 7 and 8 ) , Sybase Open Client , Informix CLI , OLE/DB for Microsoft SQL Server 7, ODBC.
Generic ODBC, HP NeoView, IBM DB2/UDB, Informix IDS, Microsoft SQL Server, mySQL, Netezza, Teradata, Oracle, Sybase Adaptive Server Enterprise (ASE), Sybase IQ. Native bulk loading supported for all major databases.
Support for ERP Sources
Provides PowerConnect modules for connecting to PeopleSoft, Siebel and SAP R/3. Informatica is coming out with an Open PowerConnect API for rest of the ERP systems. So customers can write interfaces using this module.
DataStage XE provides full integration with leading enterprise applications including SAP, Siebel, and PeopleSoft. The DataStage Extract PACKs for SAP R/3, Siebel and PeopleSoft, and the DataStage Load PACK for SAP BW enable warehouse developers to integrate this data with the organizations other data sources.
JD Edwards OneWorld and World, Oracle e-Business Suite (EBS), PeopleTools, SAP BI and BW Server. SAP ERP and R/3 via ABAP, BAPI, and IDOC, Siebel, SalesForce.com
Code Reusability capability within the product
Supports development of Mapplets which acts as library between Mappings and also can make transformations shareable across Mappings.
Permits the reuse of existing code through APIs thereby eliminating redundancy and retesting of established business rules
Supports Code reusablitiy, we can reuse Workflows, Dataflows, Tables,
Parallelism
Supports parallelism, one can run multiple mapping session on the same server.
Automatically distributes independent job flows across multiple CPU processes. This feature ensures the best use of available resources and speeds up overall processing time for the application.
Supports parallelsim, one can run multiple dataflows/work flows parallelly and run jobs parallelly
Code Generator
PowerCenter does not generate code, all the mappings developed will be inform of GUI interface.
Only DataStage XE/390 version automatically generates and optimizes native COBOL code and JCL scripts that run on the OS/390 mainframe.
DI Automatically generates an appropriate interface calls to access the data into the source systems. For most ERP applications DI generates optimized SQL for the specific target database systems (Oracle, DB2, SQL Server, and Informix)
Data Transformation Method (Engine Based ?)
PowerCenter is based on Hub &amp. Spoke architecture and has inbuilt Transformation engine.
Transformation is engine based – column-to-column mappings
Tranformation is engine based.
Building &amp. Managing Aggregates
Aggregation can be built using the built in transformation provided.
Enhances performance and reduces I/O with its built-in sorting and aggregation capabilities. The Sort and Aggregation stages of DataStage work directly on rows as they pass through the engine rather than depending on SQL and intermediate tables.
Aggregation can be built using the Query tranformation with the help of builit-n functions
Support for various data types
Supports most of the industry standard data types. This also depends on the kind of source system being used.
It supports most of the industry standard data types. It supports XML also.
It supports most of the industry standard data types. It supports XML also.
Data Quality Check functionality or feature
Does not have such feature. Needs to be handled prorgramatically.
Through Quality Manager it is possible to audit, monitor, and certify data quality at key points throughout the data integration lifecycle.
Supports Data quality check using different sets of transformations
Debugging and logging features
Does not a separate debugging Tool. The workaround is by setting the "verbose" property on each transformation. By this Informatica will create log files in the server, which can be used for further analysis.
Helps developers verify their code with a built-in debugger thereby increasing application reliability as well as reducing the amount of time developers spend fixing errors and bugs. Supports debugging on row-by-row basis using break points. DataStage immediately detects and corrects errors in logic or unexpected legacy data values using this. Highly useful for complex transformation, date conversions etc.
It does supports job execution in Debug mode
Exception Handling
Throws out the error records or rejected records into a log file
Supports exception handling.
Supports exception handling using Try, Catch blocks
How Tool Provides information about exception
Through log files stored in the server
Developers can closely observe the running jobs in the Monitor Window to provide run-time feedback on user-selected intervals. The powerful process viewer estimates rows-per-second and allows developers to pinpoint possible bottle-necks and/or points of failure. Using the Director, the developer can browse detailed log records as each step of a job completes. These date and time stamped log records include notes reported by the DataStage Server as well as messages returned by the operating environment or source and target database systems. DataStage highlights log records with colored icons (green for informational, yellow are warnings, red for fatal) for easy identification.
There are serveral exception categories available in DI. DI maintains 3 different logs (Trace, Error, Statistics) at the time of execution and these log files are stored in the server. Trace log displays information start &amp. endi time of the job, workflow, dataflow. Statistics log window displays row count, path name, state of the DI object(job, wf,df, tranformations) and elapsed &amp. absolute time. Error log displays the name of the object being executed, description &amp. type of error occured. Monitor window displays job status with colored icons (green, red and yellow)
Restarting an aborted ETL process
Support restarting of the mappings
Restart is possible. Can restart from the point of failure.
A Data Integrator feature that allows you to run unsuccessful jobs in recovery mode. Can restart from the point of failure
Memory (Minimum/ Recommended) requirement at client machine
Minimum 128 MB
64 MB
Min 256 MB
Memory (Minimum/ Recommended) requirement at Server machine
Minimum is 256 MB and depends on the nature of the mappings. Each session of the PowerCenter takes around 8 MB of memory so according to the load the memory is needed.
Minimum 256 MB
Pentium processor with a minimum of 256 MB but recommended 512 MB
RAM and 100 MB free disk space (memory-intensive jobs require more
free disk space).
Repository Backup and Recovery
PowerCenter comes with good features for backup and recovery of the repository. This can done through Repository Manager.
Supports distributed Repository – Remote sites can subscribe to a set of meta data objects within the warehouse application. These sites are notified via email when meta data changes occur within their subscription. DataStage XE offers version control such as table definitions, transformation rules, and source/target column mappings within a 2-part numbering scheme.
You can also export an entire repository to a file. When you export or import a repository, jobs and their objects (created in Data Integrator) are automatically exported or imported as well.
Source: ETL Tool Comparison, 2009
References
ETL Tool Comparison. (2009, April). Retrieved November 15, 2014, from dwhnotes.com: dwhnotes.com/wp-content/uploads/2009/04/etl_tool-comparision.xls
American Airlines. (2011, November). History of AMR Corporation and American Airlines. Retrieved November 15, 2014, from aa.com: http://www.aa.com/i18n/amrcorp/corporateInformation/facts/history.jsp
American Airlines. (n.d.). About American Airlines. Retrieved November 15, 2014, from aa.com: http://www.aa.com/i18n/aboutUs/corporateResponsibility/company/profile-and-fast-facts.jsp
Passionned Group. (2014). ETL Tools Comparison. Retrieved November 15, 2014, from etltool.com: http://www.etltool.com/etl-tools-comparison/
Sybase. (2014). American Airlines. Retrieved November 15, 2014, from sybase.com: http://www.sybase.com/detail?id=210272