This is Part 3 of the transformation and cleanse videos. I'd like to go over the duplicates rule and the calculated column rule. So let's first send in a fresh set of data. And I'll talk about duplicates first. Now, they can be very common but complex tasks as duplicates can come in many forms. In this sample set, I can see several duplicate values in the first column, but in the third column, I have a couple of duplicates.
So these are valid entries, but what I want to do is I want to get rid of the duplicates that are from the third column. So I want to keep all the rows that have unique values in the first and third columns. So to do that, I'm going to open up the duplicate's editor. And I'm going to choose my first and third columns. And then I need to choose the matching method. Exact is what I actually want for this example, but I could choose a case insensitive, or there is a fuzzy logic matching.
The fuzzy logic option uses a slightly modified Double Metaphone algorithm where each word is passed through separately. This option is only available for English at this point. After I choose my key columns and the matching type, then I want to choose how do I want the rows processed? The first option is to include all unique rows and take one of the duplicate rows and get rid of the rest. The rows that are kicked out are arbitrarily picked, and the end result is a distinct set of rows for the key columns.
Another option is to exclude all of the duplicate rows. And if you choose that option, you might want to choose to export those duplicate rows to a file. And that way you can take a look at them and handle them later. The last option is to exclude all unique rows and just leave the duplicates. It just depends on what it is you want to accomplish.
When you apply the rule, you'll see a new tab that shows you the duplicates grouped together. And this allows you to understand the data set and kind of confirm the rule you created filters out the right duplicates. Remember now, all the sample rows are what we process when you preview the rule. And this option here of exporting only occurs when you do the final export, all right.
Now, let's look at the last rule, the calculated column. I call it the catch all rule. Here you can apply any SQL function or math function to manipulate the data. So let's go and get the calculated editor. Now, if you want to continue to use SQL for your transformations, make sure that this option here is selected. When you do that, you will get a list of categories of types of functions that we're supporting. And as I click on these categories, you're going to see them loaded here.
Now, generating transformation SQL is only available for Oracle and MySQL in the 3-7 release. But we do plan to add support for other database platforms in later releases. To build a calculated column, browse and choose a function, let's say I'm going to use the CONCAT function, double clicking on it will provide a template to guide you along. If you need more information about that function, you can read a quick description here or click the link, which will go to the vendors database site.
So here I want to replace this by going to the columns. I think I'll use landgrabber here. And maybe I'll just throw in base, and I'll apply this rule here. It will create a new column. If I wanted to though, I could have put the name of the column here, of the original column and replace that landgrabber column, that's possible too.
And if you'll notice, you'll see that this rule is in white letters with a dark background. This means that SQL was used to generate the rule. And you can also see the SQL here. But any time that you add a rule that does not use SQL, the rule will turn red, so that you know that all future steps are not using SQL.
Now, this is important to know so that if you're not using SQL, you can't export as a view or a snapshot. And if you're not using SQL, you can only export as a flat file or as a data set. So it's something to know. For example the duplicates is not using SQL. So when we add this rule, we can clearly see it turns red. So that means this is using SQL, and from here on out it will not be using SQL.
Also, if I go to the calculated column, and I decide, you know what? I don't want to use that sort of functions, I want to come down here, and I want to maybe use a different function. Let me just start with a new one there. But say I want to CONCAT, langrabber and base. I'm doing the same thing, but in this case, I am not using SQL, and the red tells me that I'm not. So there you have it. The new transformation and cleanse utility. Have fun using it and make sure to send us all your enhancement requests.