This post is also available in: Español (Spanish)
Below is a comparison table between ETL tools: SQL Server Integration Services aka SSIS vs PowerCenter. What tool should I choose? What are the differences? What are the advantages and disadvantages? Do they meet my requirements and particular needs? These are some of the questions that you may have asked yourself at the start of any new project, and we will aim to answer them in this post by providing you with some key facts regarding two very potent tools: SQL Server Integration Services a.k.a. SSIS versus Informatica PowerCenter.
Working with the tools: Tabs or Windows
You will notice the first big difference as soon as you open these two applications. PowerCenter comprises several programmes, while SSIS has only one:
Microsoft Visual Studio but it also contains a few tabs and panels to carry out our work: SSIS allows you to work from a single screen, although identifying and placing each of the panels will require some organisation at the beginning. On the other hand, with PowerCenter you will need to open multiple windows in order to start working and browse to your working folder each time. This in addition to jumping between them very often, because you will be doing different jobs in each of them, which means that it is rather user friendly from the start.
Should I launch a query or several arrows?
This is the point where these ETL’s have separated themselves and taken different paths. Informatica’s tool has decided that we should spend several minutes going through a few errors while mapping with arrows. Although there is an option to use Layout > Autolink by Name/Position, given the standard naming convention, it will not be very applicable. SSIS they have simplified this step and with only a few clicks and joining a few fields we can solve most of the processes.
Managing variables and parameters is crucial in your day to day work, allowing us to be more flexible and speeding up our development. Informatica have gone with the option of creating a parameters file (plain text) which is used as a reference for each session, but it can get complicated when you have different variables for mapping, worklet, workflow and global variables, without mentioning having to move all that data between the mapping and the higher levels. In SSIS everything is simpler. Although parameters are a bit more hidden, variables are clearly visible and they can be assigned without many complications.
When loading dimensions using Power Center, we are assisted by the Sequence transformations to generate the Surrogate Key and Router in order to define the loading. Whereas with SSIS, we can either use the Slow Changing Dimension transformation wizard or a customised package including T-SQL Merge, Dimension Merge component or a tailored solution.
Below are some links from my colleagues so you can find out a lot more about dimensions loading
Loading the facts table will be the subject of another post in the PowerCenter vs SSIS series, so don’t miss it.
It’s debugging time!
For this part of our work, which consists in analysing what is going on half way through our executions, I find the Informatica PowerCenter’s Debugger option to be very flexible and practical, while in Integration Service this will be done using Breakpoints and Data Viewer.
SolidQ ABA Framework
We have now arrived at the end of our comparison post: “SSIS vs PowerCenter”, I hope you liked it and found it useful to give you a foundation when choosing which tool to use. In case of any queries, please leave a comment so we can check it out.
TRY THE TOOL WHICH CAN ANALYSE YOUR QUERY PERFORMANCE FOR FREE
At SolidQ we are ready to launch a new tool so we would like to give you the opportunity to try it before anyone else. It goes by the code name of TSQL-CSI-DW, and it’s no less than a datawarehouse to analyse your queries’ performance. It is based on a simple idea: using our template in order to activate an extended event trace which can then be processed by our Datawarehouse so you can use it later on with a Power BI tool that we will show you for this purpose.
Below are two different application uses:
- Discovering what query patterns are causing a performance regression in your server
- Have any server changes had any effect on its performance?
Following are some of the screens that you will be able to browse through freely using the Power BI tool (check out our earlier videos to find out how to use them):
The figures shown in the above tables are simulated values
Below are some of the key points for our first version:
- Compatible with all SQL Server editions
- And by all, we mean ALL. From SQL Server 2000 onwards, including Azure SQL Database. Both PaaS as well as IaaS.
- Multiple instances and DB
- Allows you to analyse several instances and databases from the same dashboard.
- Second by second granularity
- Everything can be analysed with a second by second granularity, so you can find out what happened in any specific second in the past.
- Filter by any field
- Not only by date and time. All shown dimensions can be used to filter data. You will be able to do complex searches by:
- User login that launched the query
- Application that launched the query
- Text included in the query
- Hostname that the query was launched from
- Databases affected by the query
- Extremely high compression ratios:
- We have managed to achieve a compression ratio of between 7.5 and 13.5 bytes per query (yes, that’s right :)), allowing us to store billions of requests with a second by second granularity, so you can see the full details of your queries for a long time.
- Workload comparisons
- You can check the consumption evolution from several different points of view (reads, writes, duration and CPU) in order to see the evolution in the performance of performance comparison databases queries.
- You can highlight workloads and compare improvement ratios easily in order to see performance differences between your brands.
- OnPremise vs Cloud
- PRO vs PRE
- Before change vs After change
- Data sources compatible with XEvent and profiler
- You can load data sources from XEvent and profiler into the same DW For example, this will allow you to analyse your SQL 2000 instances together with your Azure SQL Database instances.
- PaaS and IaaS
- You can use the information generated by your OnPremise instances together with those obtained from your Azure SQL Database instances.
- Fault detection
- Coming up…
Hello everybody, in this video we are going to show you how to use the database performance analysis tool created by SolidQ. We are looking at a Power BI which can be used to easily analyse what is happening in our server from a database query performance point of view. What you can see right now is the main dashboard with its evolution on a day by day basis. These are the last four days we have captured, during which we have undergone changes in our server which can be seen in a very simple and graphic way. For example, with this yellow line we can see and identify the amount of requests at the end, the transactional SQL queries launched against the server in order to compare what is going on. As a matter of example, if we look here we can see that on day 26 there have been 14.85 million queries, on day 27 there were 12.12 million queries and on day 28, we had 12.06 million queries. Perhaps the most singular issue, if we look here, is that we have had a gradual progression in regards to performance. I.e., in this brown bar we can see the KPI representing the advantages compared to the previous day’s duration. This means that we have carried out a specific change that has resulted in a 78% improvement in the duration, i.e., all our aggregated queries were working in 78% less time than on the previous day. Here we can see again how, in comparison with the previous day, we have still made an 80% performance improvement when compared to the previous day, i.e., the durations would be 80% more efficient than the previous day, on top of the already achieved increase of 78% in performance. However, what I am seeing here on day 28, is that something happened that resulted in a 29% decrease in duration performance as well as a 4.72% increase in CPU consumption when compared to the previous day, i.e., day 27.
We must remember that this is not aggregated data, which means that, despite this 29% decrease, we are still on the positive path. We can also compare whether we are referring to the same amount of events, i.e., we are referring to approximately 12.1 million events per day. We can see this if you look at the CPU level, as well as the writes level, which are represented by the red and the green bars respectively, in an aggregate manner. And down here we can see the same in regards to the average duration level, average CPU, all averages. Therefore, here we can see both the overall aggregated consumption as well as the execution averages. So let’s now have a look at what happened on day 28. If we go back to see day 28 alone, we can go into further detail and see exactly the same evolution, but in this case we can carry out a lot more filtering. Here we can see exactly what where the queries and how many events there have been for every query pattern. But now what we are after is to find out what happened on day 28, so we can go to this tab here, go to day 28 by ticking the relevant box and look at the top 15 queries which have had a regression when compared to day 27, which is exactly what we are trying to find out. Therefore, if we look by CPU consumption, for example, we can find out that this query no. 5037 has had a regression when compared to day 27, which means that we are now looking at the queries that have had a regression in their duration when compared to the previous day, so we will be able to find out what is this query no. 5037 is actually doing. We can then type 5037 in the search box here to obtain the query itself that we are discussing, which is this one. Great, so now we know that this particular query, when compared to the previous day’s performance, i.e., day 27, has had a rather high performance drop. So we’re going to find out what happened. If we wanted to see the details concerning this query, we would go to the query evolution tab by clicking on this tab here. We can type again the query no. 5037 to see what happened and how have the functions within that query evolved, either on a day by day basis, i.e., how has this query evolved over time. We can drill down and see it split into quarter of an hour sections, and again drill down and see it hour by hour, or click to increase the level to daily periods. So summing up, we can see the amount of events for that query throughout a period of time, on day 24, 25, 26, 28. Day 28 may seem odd because, if you look, the amount of events, which is represented here by the yellow line, has been lower, the amount of events in millions has decreased, but we can see that the duration has been longer. This can be observed from several different points of view such as the duration, also in regards to the CPU, and in regards to writes and reads. We can do a drill down and drill up to find out how this evolution has taken place. Very good, so now we are going to see what happened on day 28, because it seems obvious that something has happened between day 27 and day 28. We could go to day 28 and by looking at the same query, we could do exactly the opposite and drill down in order to see what happened on day 28. We can see that this query was launched at around 9am with a rather normal duration, a small duration, but that by 11am something had happened, and it resulted in a much longer duration. We could continue drilling down, however, I will go directly to the 11am section by ticking the 11am box, and then we will continue to drill down once inside the 11am section to go down to the quarters of an hour sections, 5 minute sections and second by second. And here we can effectively see that at 11am 45 minutes and 56 seconds they have launched this query with this duration that you can see here in milliseconds and that it was launched again at second 57 and second 59. So here we have found three different executions that were made nearly consecutively during that hour. And now I could go and find out what user or application carried out this operations. Let’s have a look. We could now come here and look at day 28 to see what happened with this query no. 5037, in order to find out if there is anything odd, for example, something concerning the host name. I could now go to the hostname and search for query 5037, remember, this is the query that we are looking for, no. 5037, and we filter to this query in order to find out if there is something odd there. On day 28, for example, we can see that almost all the durations that have taken place, both in regards to duration as well as CPU consumption and in regards to the reads, it would seem that this query, which is a select type of query, so it does not involve any writes, has been produced by this hostname 67. What would happen if I looked to see what had happened on day 27? Because if I click on day 27 and remove day 28, I would realise that the hostnames are completely different. In fact, if I go back even more, I will notice that none of the executions made on day 28 coincide with the rest of the days. This would mean that somebody from that hostname has launched this query that resulted in a performance regression. Who was it? Well, we could go to ApplicationName, type in the query number that we are discussing, i.e., query 5037, and see that it was launched on day 28 by this ApplicationName, ApplicationName 145, and we could also of course detect who has launched this query. If I go to the login name search and I type 5037 again, I will obtain the name of the person who launched this query at 11am in order to try and find the culprit for this query, who happens to be user 259.
And we now have a culprit, and we have the query so we can do some optimising.
Hello everybody, in this video we are going to see how to find out in a rather simple way whether two different workloads selected at our own discretion behave in a similar, identical or better way. For example, I could be carrying out a workload against my on-premise environment in pre production and compare exactly how it will operate during production or find out how our load against our on-premise environment would behave in our Azure SQL Database. In this case, since this is done at our entire discretion, I have simply chosen a point in time where I have designed a pre and post scenario and some changes to the database. Some of the changes I made were creating a lot more indexes than previously detected, deleting other indexes and changing specific queries in order to confirm whether these changes have resulted in a better or worse performance.
What I am looking at right now is precisely this analysis. In a very visual manner I can right now see that, following these changes I made previously, I now have a 57% decrease in CPU consumption and a 14% decrease in query duration, and the same value in regards to reads and, as a matter of fact, also in regards to writes. This is in line with the changes I made, because I increased the amount of indexes, so if there is a 13% increase in writes, it does not matter because my aim was to increase performance by reducing CPU usage, which has been notably achieved. I could do this in general terms, as we are seeing now, or I could apply several filters to see queries by specific users, applications or databases in order to be able to see, in a very simple way, these performance changes between each of the items that I can choose from in here. I could go even beyond and see what queries have improved or got worse as a result of those performance changes.
In this tab here, I could obviously apply any filters on any levels in a very precise way, but what I am looking at is that, at a glance, I have a query, namely query no. 142 which has achieved a 68500% increase in performance compared to the previous scenario, i.e., before I created those indexes and deleted others and made specific modifications to certain queries. Now let’s look at the implications: If I look at query no. 142, I can see in a rather simple way how it has behaved in terms of duration, CPU consumption, reads and writes in both scenarios, meaning the scenario I personally marked as Problem. We can see here that it had a high duration as compared to the FixApplied scenario that we see here for the same amount of events. This means that, if we look here, the events lines, which are represented in yellow, are very similar because the amount of events is similar, so this can be used to compare performance because I am comparing the same amount of executions. But there is clearly an increase in performance that is rather obvious for this query. Would it be possible to do exactly the same but in the opposite direction? Could we search for queries that have a regression? That’s right.
I could go here and, instead of sorting it by descending order, I could sort it by ascending order. So I could come here and sort the performance gain in ascending order. That means that we would be searching exactly backwards. Finding out which queries are doing worse. In this case, query no. 2483 is doing 99.8% worse in regards to reads. And I could be doing exactly the same thing, because I went there to find it, etc., so using this tool I can see what queries are doing better and which are doing worse after a change or comparing the workloads selected at my own discretion.
Latest posts by Isidro Blanco (see all)
- Integration Services (SSIS) vs. POWERCENTER. ETL tool comparation (I) - August 14, 2017