Bookkeeping Service Providers

  • Accounting
  • Bookkeeping
  • US Taxation
  • Financial Planning
  • Accounting Software
  • Small Business Finance
You are here: Home / Uncategorized / Comparing Feature Selection Scripts

Comparing Feature Selection Scripts

February 5, 2019 by cbn Leave a Comment

In this series about feature selection, the first three posts covered three different WhizzML scripts that can help you with this task: Recursive Feature Elimination, Boruta and Best-First Feature Selection. We explained how they work and the needed parameters for each one of them, applying the scripts to the system failures in trucks dataset described in the first post.

Feature Selection Scripts

As we previously explained, this kind of script can help us deal with wide datasets by selecting the most useful features. They are an interesting alternative to dimensionality reduction algorithms such as Principal Component Analysis (PCA). Furthermore, they provide the advantage that you don’t lose any model interpretability because you are not transforming your features.

Feature Selection algorithms can work in two different ways:

  • They can start using all the fields from the dataset and, iteratively, remove the least important fields. This is how Recursive Feature Elimination and Boruta work.
  • They can start with 0 fields, and, iteratively, add the most important features. This is how Best-First Feature Selection works.

Let’s compare the results from these three scripts. To that end, we have used them with a reduced version of the dataset mentioned previously. This reduced version, the same that we used in the Best-First post, has 29 fields and 15,000 rows.

In the table below, we can see a comparison between the scripts. We have annotated the execution times, the number of output fields, and the number of output fields in common between each pair of scripts. For each script output dataset, we have created and evaluated an ensemble.

  1. *  Using max-runs of 10 and min-gain of 0.01 (default parameters) 
  2. *  Using the same input parameters as in the previous post.
  3. *  phi-score with the 29 fields dataset is 0.84. 

From these tests, we extract some interesting conclusions:

  • Recursive Feature Selection is a simple script that runs extremely fast with only a few parameters, all without sacrificing accuracy. Its results are clearly consistent with the ones from the other scripts.
  • Boruta is a useful script that has an interesting feature: it is free from user bias because the n parameter, that represents the number of features to select, is not required.
  • Best-First Feature Selection is the most time-consuming of the scripts so we should use it with smaller datasets or on a previously reduced one. However, it is the only one that starts with 0 fields, and the information from the very first iterations is useful to see which are the most important features of our dataset.

The system failures in trucks dataset seemed to be a difficult dataset to work with. The large number of fields and their useless names made it hard to apply domain knowledge to it. These scripts helped us to automatically obtain the most important features without loosing modeling performance.

Now it’s your turn! Try out these new scripts and let us know if you have any feedback at support@bigml.com. What’s more, give WhizzML a try and create your own scripts that help automate your frequent tasks.

Share this:

Like this:

Like Loading…

Related

Share on FacebookShare on TwitterShare on Google+Share on LinkedinShare on Pinterest

Filed Under: Uncategorized

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • May 2021
  • April 2021
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • March 2016

Recent Posts

  • FabCon Vienna: Build data-rich agents on an enterprise-ready foundation
  • Agent Factory: Connecting agents, apps, and data with new open standards like MCP and A2A
  • Azure mandatory multifactor authentication: Phase 2 starting in October 2025
  • Microsoft Cost Management updates—July & August 2025
  • Protecting Azure Infrastructure from silicon to systems

Recent Comments

    Categories

    • Accounting
    • Accounting Software
    • BlockChain
    • Bookkeeping
    • CLOUD
    • Data Center
    • Financial Planning
    • IOT
    • Machine Learning & AI
    • SECURITY
    • Uncategorized
    • US Taxation

    Categories

    • Accounting (145)
    • Accounting Software (27)
    • BlockChain (18)
    • Bookkeeping (205)
    • CLOUD (1,321)
    • Data Center (214)
    • Financial Planning (345)
    • IOT (260)
    • Machine Learning & AI (41)
    • SECURITY (620)
    • Uncategorized (1,284)
    • US Taxation (17)

    Subscribe Our Newsletter

     Subscribing I accept the privacy rules of this site

    Copyright © 2025 · News Pro Theme on Genesis Framework · WordPress · Log in