An audio fingerprinting system deals with four challenging tasks: The robustness, the reliability, the compactness, and the scalability. By preserving the others, we explore the compactness and robustness aspects of audio fingerprinting systems and propose a description and storage model based on structural analysis of audio clips. The proposed method constructs the fingerprints from the most representative section of an audio clip. Contrary to similar studies, there is no need to construct and store all the fingerprints of each frame within the database; only one fingerprint per clip is sufficient. We make use of the Audio Spectrum Flatness (ASF) and the Audio Signature (AS) features of the MPEG-7 standard, which are new to the audio feature family and have not been considered as much as other feature types. The fingerprints are stored in the form of XML, thus providing the interoperability on a world-wide scale. XML-based representation of fingerprints is very suitable particularly for portable devices such as a PDA or a mobile phone due to the transportation issues. The proposed approach is evaluated on a test bed consisting of 540 musical clips based on the MPEG-7 features. The well known MFCC feature set is also considered in the experiments for the evaluation of features(1).