Tezin Türü: Doktora
Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye
Tezin Onay Tarihi: 2013
Öğrenci: HÜSEYİN EMRAH TAŞLI
Danışman: ABDULLAH AYDIN ALATAN
Özet:The wide availability of visual capture and display devices with increasing resolution and a ordable prices, made the visual data an indispensable part of our life. The enormous amount of visual data produced every day is captured, stored and sometimes processed for further analysis. In this era of technological improvement, where an exponential increase in the number and capability of the devices is experienced, researchers have focused on e cient and accurate ways to reach, store, analyse and display the data for various purposes. At the capture side of the visual content, the number of cameras has rapidly increased in close correlation to the number of mobile phones with built in cameras. As with the quantity increase, the quality of the sensors have also boosted regarding the resolution, color/brightness and noise level performance. On the other side of the pipeline, there has been some major changes at the display side over the last couple of decades. With the introduction of the Plasma and LCD (Liquid-crystal-display) type of displays, sizes have rapidly decreased in the depth dimension. This decrease also made the mobility of the displays possible especially with lower power consumptions. Therefore, mobile equipments with high resolution displays could easily t in our pockets. Moreover, another major stepping stone towards a richer visual experience is observed with the introduction of 3D capable displays for di erent sizes and resolutions. There has been a major increase in the popularity of 3D TVs in the last couple of years. Mobile devices with 3D capability have also been introduced in the market. However, the fast increase in the display side could not be matched as well in the capture and broadcast side. Therefore, the popularity of the 3D devices have been lower than the expectations. Various factors could be counted as a cause for such a slower reaction. These factors and possible solutions for such problems are presented in this thesis. This thesis deals with various aspects of the research in visual content analysis and display technologies. The author's previous experience in real time processing of image/video data, human visual perspectives for objective/subjective quality analysis, stereoscopy and 3D perception, image understanding for object recognition, image feature descriptors using low-, mid- and region- level visual cues have been vastly incorporated in this thesis. Applications of the proposed techniques for real world scenarios have been conducted and results are supported with performance evaluations using objective and subjective quality metrics. Superpixel extraction is proposed as an e cient image representation tool. It has been shown to o er computational e ciency with high segmentation performance. Extraction of the superpixel has been realized using a color and spatial distance metric where the weighting is de ned as a trade-o parameter. With extensive comparative tests with the state-of-the-art, the proposed scheme is shown to yield a remarkable alternative in the current superpixel and supervoxel extraction methods with faster execution times and competitive segmentation performances. The extracted superpixels have been further utilized for user-assisted image segmentation purposes. User assistance is required as drawing lines on the representative parts of the image to de ne foreground and background regions. An energy minimization technique is then used to de ne most likely regions to be segmented. The acquired foreground segments could further be used for rendering the stereo pair of an image for 3D visualization purposes. The same energy formulization is also extended on the stereo and video footage for completeness. The segmented superpixel patches are also presented as mid-level information sources and applied on the image classi cation task. Pixel-wise image descriptors are studied and extended using the proposed mid-level region descriptor in order to capture the complementary mid-level information present in the image. The experimental results have shown supporting evidence for the proposal where classi cation scores has considerably increased.