Analysis of human behaviour for deducing health and well-being information is one of the contemporary challenges given the ageing in place. To this end, existing and newly developed machine learning methods are needed to be evaluated using annotated real-world data sets. However, the metrics used in performance evaluation are directly taken from the machine learning domain, and they do not necessarily consider the specific needs of human behaviour analysis such as recognizing the duration, start time and frequency of the activities. Moreover, the commonly used metrics such as accuracy or F-measure can be misleading in the presence of skewed class distributions as in the case of human behaviour recognition. In this study, we evaluate the performance of two machine learning methods, hidden Markov model and time windowed neural network on five different real-world data sets through human behaviour understanding for health assessment perspective. According to the experimental results, standard metrics fail to reveal the actual performance of the two compared machine learning methods in terms of behaviour recognition. On the other hand, the proposed evaluation mechanism which considers three different activity categories leads to a more realistic evaluation of the overall performance.