Indian Journal of Animal Sciences (Feb 2022)

An algorithm-based approach for identification of most relevant linear traits for selecting high producing Murrah buffaloes

  • Sunesh Balhara,
  • Ashok Kumar Balhara,
  • Naresh Dahiya,
  • Rishi Pal Singh,
  • AP Ruhil,
  • Himanshu .

DOI
https://doi.org/10.56093/ijans.v92i5.119061
Journal volume & issue
Vol. 92, no. 5

Abstract

Read online

Selection of high producing dairy animals is important for dairy profitability and future breeding stock. The farmers have relied on physical characters for identification of milk producing ability in animals. In the present study feature selection algorithm were implemented to identify most relevant traits for prediction of peak milk yield in buffaloes. Based on data recorded from 259 lactating Murrah buffaloes, 14 different body and udder conformation traits, viz. Body Length (BL), Height at Wither (HW), Heart Girth (HG), Body Depth (BD), Paunch Girth (PG), Naval-Udder Distance (NUD), Udder Depth (UD), Rear Udder Height (RUH), Fore Teat Distance (FTD), Rear Teat Distance (RTD), Fore Rear Teat Distance (FRTD), Teat Length (TL), Rump Width (RW) and Rear Udder Width (RUW) were selected. Descriptive statistical analysis revealed that the correlation with peak yield is highest for RUH, followed RUW, lactation number (LN), NUD, FRTD, HG, RW, RTD, UD, TL, PG, BL, BD, HW and FTD. Correlation-based feature selection in ‘WEKA’ software platform suggested that nine parameters have high correlation with peak yield – UD, NUD, RTD, FRTD, TL, RW, RUW, RUH and TL. The Multiple linear regression (MLR) was implemented using the linear regression function available under function classifier in WEKA. Two Regression models (Model 1 and Model 2) were developed using all fifteen input parameters and with subset of 9 input parameters suggested in ‘feature selection’. All models were trained and validated with 10-fold cross validation method. The performance of models developed for prediction peak milk yield was evaluated using the metrics correlation coefficient and root mean squared error (RMSE). Comparison of the performance evaluation matrices revealed that the Model 2 requiring lesser number of inputs is good enough in predicting peak yield with 0.8429 correlation coefficient and 2.16 root mean squared error.

Keywords