Impact of Dataset Characteristics on Optimal Model Selection: A Comparative Analysis of Simulated and Real-World Data

Harald H. Rietdijk, Olayemi Shola Alabi, Patricia Conde-Cespedes, Talko B. Dijkhuis, Hilbrand K.E. Oldenhuis, Maria Trocan

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

Abstract

In the rapidly evolving field of Machine Learning , selecting the most appropriate model for a given dataset is crucial. Understanding the characteristics of a dataset can significantly influence the outcomes of predictive modeling efforts, making the study of the properties of the dataset an essential component of data science. This study investigates the possibilities of using simulated human data for personalized applications, specifically for testing clustering approaches. In particular, the study focuses on the relationship between dataset characteristics and the selection of the optimal classification model for clusters of datasets. The results of this study provide critical insights for researchers and practitioners in machine learning, emphasizing the importance of dataset characteristics and variability in building and selecting robust models for diverse data conditions. The use of human simulation data provide valuable insights but requires further refinement to capture the full variability of real-world conditions.
Translated title of the contributionImpact van datasetkenmerken op optimale modelkeuze: een vergelijkende analyse van gesimuleerde en echte gegevens
Original languageEnglish
Title of host publicationProceedings - IEEE International Symposium on Circuits and Systems
PublisherInstitute of Electrical and Electronics Engineers
Pages1-5
Number of pages5
ISBN (Electronic)979-8-3503-5683-0
ISBN (Print)979-8-3503-5684-7
DOIs
Publication statusPublished - 25 May 2025

Publication series

SeriesProceedings - IEEE International Symposium on Circuits and Systems

Keywords

  • data characteristics
  • machine learning
  • model selection
  • simulated data

Research Focus Areas Hanze University of Applied Sciences * (mandatory by Hanze)

  • Healthy Ageing
  • Entrepreneurship

Research Focus Areas Research Centre or Centre of Expertise * (mandatory by Hanze)

  • Technology and digitalization

Publinova themes

  • Technology
  • ICT and Media
  • Health

Fingerprint

Dive into the research topics of 'Impact of Dataset Characteristics on Optimal Model Selection: A Comparative Analysis of Simulated and Real-World Data'. Together they form a unique fingerprint.

Cite this