Data pre-processing to improve the mining of large feed databases

F. Maroto-Molina; A. Gómez-Cabrera; J.E. Guerrero-Ginel; A. Garrido-Varo; D. Sauvant; G. Tran; V. Heuzé; D.C. Pérez-Marín

Animal (Jan 2013)

Data pre-processing to improve the mining of large feed databases

F. Maroto-Molina,
A. Gómez-Cabrera,
J.E. Guerrero-Ginel,
A. Garrido-Varo,
D. Sauvant,
G. Tran,
V. Heuzé,
D.C. Pérez-Marín

Affiliations

F. Maroto-Molina: Servicio de Información sobre Alimentos, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
A. Gómez-Cabrera: Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
J.E. Guerrero-Ginel: Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
A. Garrido-Varo: Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain
D. Sauvant: UMR 791 Physiologie de la nutrition et de l'alimentation, AgroParisTech, 16 rue Claude Bernard, 75231, Paris, Cedex 05, France
G. Tran: Association Française de Zootechnie, AgroParisTech, 16 rue Claude Bernard, 75231, Paris, Cedex 05, France
V. Heuzé: Association Française de Zootechnie, AgroParisTech, 16 rue Claude Bernard, 75231, Paris, Cedex 05, France
D.C. Pérez-Marín: Departamento de Producción Animal, ETS Ingeniería Agronómica y de Montes, Universidad de Córdoba, Ctra. Nacional IV km. 396, 14014, Córdoba, Spain

Journal volume & issue: Vol. 7, no. 7
pp. 1128 – 1136

Abstract

Read online

The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.

Published in Animal

ISSN: 1751-7311 (Print); 1751-732X (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Agriculture: Animal culture
Website: https://www.journals.elsevier.com/animal/

About the journal

Abstract

Keywords