I mean to say it is up to us to decide I think some iteration might be required for you to fully understand what is best for the context of trouble Okay so So let me just run this last and then I will run this So this is a very important step That’s why I wanted to make sure that you fully understand this Now if you see the output this towards joint together with under school right now the yeah the school that we have used in the aggregation The submission Right now let me just use some other score at the same Max Okay Before that let us also look at the length So what is the card inability now.
Now the cardinality reduced from 7000 to 2000 by doing this Okay let me look at the max so I will dynamically change it to Max I’m doing it again Now look at the string here So see here Now we have Edward Rick Dan Look at before You can see there is a difference You can clearly see that there is a difference, Okay It is because we have taken a different measure Okay Difference court to select the talk towards Okay, so this might differ based on what you select Uh but essentially now this is what we need to do.
Now you see the carnality is actually reduced from 7000 to 3000 but it is higher than the carnality that we achieved in the previous one which was 2000 So now we have 3000 carnality That means there are 1000 records which are now unique which will not which were not reflecting in the previous case Now are not reflecting Okay so I would say this is a hyperparameter and you know you can play with it And that is what I have mentioned here that you guys can play with it and ending on the context of your problem Statement This can be changed, Let me go back Because the Cardinal Cardinal it is lower.
I’m a purpose I’ll go with the lower but it really depends on your business context Okay There’s just to give you a snapshot of what it looks like after we have none This feature selection, Okay Slowly we will take up the first three digits from the part number So that is what it does It basically takes the first three digits I think this re-centered as three So we will run this Okay And now if you look at the strength.
You will see only the first religions Okay so the remaining so earlier it was uh each idea was having six digits and some of them were also having the dash So six digits dash maybe a couple of more digits So all that we have reduced to the village it’s and uh we have reduced the carnality here Now again it really depends on how your organization assigns the ID So it’s always good to check Ah you know with your engineering department how the ideas air created Maybe there’s a counter Uh maybe they have you know some starting suffix and prefix toe to generate a particular Haiti for a product.
I understand You know what is under the food of the idea generation logic And based on that you can refine this logic You know you don’t have to really select the first three divisions so you can change the logic and show that by changing that logic you can capture the maxim information and you don’t lose any international Okay Okay so it’s now reduced to 4 70 So it’s quite a big production Now what we’re going to do we’re going to replace the product number and product description with these uh new values that we have arrived.
That’s what you’re going to do here Okay Now if you see Mr. Mom, Yeah So if you look at the structure now you will see that the product description changed Right now it is one word with underscore Uh the product number was changed, so that is what we have okay now what What We also happened in this data Actually I purposefully added that information We also have this plant So this plan So if I look at say people Yes plan, Hello So you can see that sort of the dependencies So that means that the particular part or product is coming from three different plants Right So that’s how that’s what it means.
These plants are actually separated with the pipe here So we also have these Women also have this kind of data where you have multiple locations You know the party’s coming toe which is fairly reasonable because you may have you know different plants producing the same part So you may have data like this and that is what I focus fully What I’ve probably done here ESO what we can do here is we can sort of clean this up also and change it to the long-form So what I mean is that if you look at dimension off yeah so we have 3 18,079.
If you clean up this plant and change this to the long-form and can the number of records will actually increase It was we will have sort of a new record added for every plant court once we clean up So that’s what we’re going to do here Okay so what we have done here is we have split ID the data with the pipe uh the limiter and then we have sort of a nested That means it will increase the size Um so all that data is stored in a new feature called Plant New with the Handcrafted Future And then later on What I’ve done is I’ve renamed the plant new to the plant because this is what we’re using in the downstream process.
It’s like creating a new feature and then deleting an old feature and renaming the new feature to the old feature So it’s a little uh a little different Okay so Now if you look at the game you will see that from 18,000 it has increased You maintain 1000 Which makes sense because some of the new Rosa added after splitting uh you know these Correct Okay so it looks good So far let’s move on And lets uh do a set-up for model buildings What we’re going to do here we’re going to build the medical education Okay so I’ll extract the dependent feature so that is nothing but um the X future and the outcome or the white feature is called outcome Okay that’s what I’m doing here.
Now I will first show you the mean and quoting and then I will tell you how the one heart and quoting works because meaning quoting is more important But that’s where all the action lies Okay, so the meaning calls are we going to do it So So what we’re going to do here is we’re going to create So we have 35 categories right in this particular data So we’re going to create the one heart encoding for those 35 Okay So guys I think we are Maybe we’re shooting the time here.
What we’re going to do is we’re going to create the mean including okay And for that you know first let me create the one heart for the vibe actor so that I can map You know they mean every future to the category Okay, so that’s what I’m doing here ESO let me just run this first okay And then create the test and train partition Okay so we have created the test intent and this is where we’re going to create the mean including.
This is the magic Luke where we create the mean including So I told you that you know some data leakage can happen So to overcome that we’re going to add some noise So general function which will help us add some noise So that’s what we’re going to do here Okay so now it’s calculating the mean and coding for a data set Okay And then we will remove the one heart and quoting that we had created to calculate the mean including I think those problems are not necessary now.