Computer Science Faculty Publications and Presentations

Machine Learning Methods for Generating High Dimensional Discrete Datasets

Giuseppe Manco, ICAR-CNR
Ettore Ritacco, ICAR-CNR
Antonino Rullo, University of Calabria
Domenico Saccà, University of Calabria
Edoardo Serra, Boise State UniversityFollow

Document Type

Article

Publication Date

3-2022

Abstract

The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Publication Information

Manco, Giuseppe; Ritacco, Ettore; Rullo, Antonino; Saccà, Domenico; and Serra, Edoardo. (2022). "Machine Learning Methods for Generating High Dimensional Discrete Datasets". WIREs: Data Mining and Knowledge Discovery, 12(2), e1450. https://doi.org/10.1002/widm.1450

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks

Computer Science Faculty Publications and Presentations

Machine Learning Methods for Generating High Dimensional Discrete Datasets

Document Type

Publication Date

Abstract

Creative Commons License

Publication Information

Included in

Browse

Links

Search

Author Corner

ScholarWorks

Computer Science Faculty Publications and Presentations

Machine Learning Methods for Generating High Dimensional Discrete Datasets

Authors

Document Type

Publication Date

Abstract

Creative Commons License

Publication Information

Included in

Share

Browse

Links

Search

Author Corner