Big data, Information theory, Machine learning
- 작성자관리자
- 배포일2017.09.14
- 조회수730
ㅇ 제목 : Big data, Information theory, Machine learning
ㅇ 일시 : 2017년 9월 19일(화) 2시~
ㅇ 장소 : KAIST N1-117호
ㅇ 강사 : Prof. Anders Host-Madsen (University of Hawaii)
ㅇ 요약 :
A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this talk is the opposite, namely that most of the value in the information – in some applications – is in the parts that deviate from the average, that are unusual, atypical. We think of this a new knowledge, things that are not learned from ordinary data. Think of art: the valuable paintings or writings are those that deviate from the norms and break the rules, that are atypical. Or groundbreaking scientific discoveries, which finds new structure in data. The aim of our approach is to extract such 'rare interesting' data out of big data sets. A central question is what 'interesting' means. Universal approaches are required, since it is not known in advance what we are looking for; and for something to be interesting it is not sufficient to be rare. We develop a measure of ‘interestingness’ based on Kolmogorov complexity, information theory and descriptive length which we call Atypicality. We show that atypicality is optimum for anomaly detection as well as having other important theoretical properties. Atypicality can be seen as a complement to machine learning, where we are looking for information that is not learned from prior data. During the talk we will discuss new methods for universal source coding and minimum descriptive length (MDL), and we show applications to the stock market, heart beat signals, ocean acoustics, and genetics.