Most of my company uses powercenter informatica for Extract-Transform-Load type data move jobs between databases.
However project I am on has a big custom Perl job with some Java thrown in for good measure to move data and trigger some other updates.
There is talk of rewriting the thing to use powercenter instead, what are people's experiences on such a project, does it make sense? Seems like you trade lot of flexibility in going to such a "off the shelf" solution, but do the ETL tools buy you much in terms of productivity for example?
Informatica is good for an operations team. It allows a non-technical person the ability to monitor and re-run jobs. However, it makes about any task much more complicated than it needs to be. It's a huge piece of software, and takes a while to learn, and it's limited in the transformations it can do without programming. I'd certainly use Perl or a programming language any day over an "Enterprise ETL" tool.
We had Informatica and Tibco starting in 2001, and people were able to easily pick up Informatica (for basic tasks) but Tibco was a pain. Tibco is now gone, and Informatica's footprint has grown, and its code is now viewed by even Business Analysts..
Once you're proficient, you can get a lot done quickly (I did 3 fact tables and 12 dimensions from multiple sources in a week, for a Finance and PO data mart), and it's easier maintenance when changing code, scheduling, transitioning to another developer, etc. Less time having fun, more time in meetings and on your org's paperwork.
We use it for data marts, data movements, and interfaces to ASPs.
It's now got a Java transformation if you want to do something totally custom, instead of a compiled C program.
In your case I would convert to Informatica for two reasons: Impact analysis (SLA) and maintainance (monitoring, single ETL tool). Reusability is another plus.
Specific to informatica: Having Impact analysis is a great tool: It prevents many emergency fixes and helps you keep up your SLAs. To me, improving SLAs is above flexibility. The monitoring features in Informatica are also very helpful.
In general: If your company is moving to a single ETL tool, converting that job will make it easier to maintain. It is also more efficient and reliable for support teams to monitor one tool. Hopefully your company tries to make objects reusable, which makes conversion easier and increase future productivity (new reusable objects you may create during conversion).
A word of caution: conversion tasks are actually hard to estimate. Make this an independent project if you can (not a part of a larger deliverable).
Coding gives you great flexibility. Be it Perl, Python, C#, Java, SQL - whatever. You can quickly and easily build some data transformations. Why would anyone ever bother to even look at ETL software, right?
Let's assume you've got the whole solution ready. All your scripts in the language of you're choice are there. Now a few questions:
- If amount of data grows and you can't fit everything in memory, where is the cache created? Can you control it?
- How do you create all the logs? Have you created that?
- What about error handling? In case of errors (e.g. disk space issues, connectivity issues, ets.) is the root cause easily indicated?
- How do you do the monitoring? Is there any dynamic dashboard?
- Is clustering possible with your solution?
- Is it possible to run some data transformation in multiple threads, to make it run faster?
- Failover: how do you handle failures? Is it possible to restart a job from the point of failure?
- Connectivity: a new data source shows up - say, Salesforce - how much time do you need to enhance the script to read/write to it?
- Can you connect to ERP systems like SAP?
- Can you get data lineage and impact analysis out of the scripts?
All of those - and even more - you get when you use some descent ETL software. Someone has been working hard and for years to deal with all those troubles and get some GUI for it. That's the difference.
Now: if you need something to load one file to a DB once in a while - anything will do. But if you plan having a lot of those an ETL software is worth considering. And if Informatica is already there in your company, then why pother and re-invent the wheel?
ETL tools like Informatica buy you productivity (and pretty pictures) if you have people that can't code. It makes sense if there's nobody that can maintain the code. For someone that can code, it's like hiring a 500-pound gorilla to move a molehill.
See also: This post and this post in this thread
It is sort of nice for the automatic job logging (you don't have to think about what you want to log...it's all pretty much done for you) and the runtime monitoring tools (how far along is my workflow, and where did it fail?).
来源:https://stackoverflow.com/questions/1202869/informatica-powercenter-vs-custom-perl-etl-job