In this paper, we study how the concepts learned by a robot can be linked to verbal concepts that humans use in language. Specifically, we develop a simple tapping behaviour on the iCub humanoid robot simulator and allow the robot to interact with a set of objects of different types and sizes to learn affordance relations in its environment. The robot records its perception, obtained from a range camera, as a feature vector, before and after applying tapping on an object. We compute effect features by subtracting initial features from final features. We cluster the effect features using Kohonen self-organizing maps to generate a set of effect categories in an unsupervised fashion. We analyze the clusters using the types and sizes of objects that fall into the effect clusters, as well as the success/fail labels manually attached to the interactions. The hand labellings and the clusters formed by robot are found to match. We conjecture that this leads to the interpretation that the robot and humans share the same "effect concepts" which could be used in human-robot communication, for example as verbs. Furthermore, we use ReliefF feature extraction method to determine the initial features that are related to clustered effects and train a multi-class support vector machine (SVM) classifier to learn the mapping between the relevant initial features and the effect categories. The results show that, 1) despite the lack of supervision, the effect clusters tend to be homogeneous in terms of success/fail, 2) the relevant features consist mainly of shape, but not size, 3) the number of relevant features remains approximately constant with respect to the number of effect clusters formed, and 4) the SVM classifier can successfully learn the effect categories using the relevant features.