The purpose of this tutorial is to give information related to linguistic summarization from the data. The increasing use of information systems by businesses and governmental agencies has created mountains of data that contain potentially valuable knowledge. Nowadays, mining summarized information from data sets is a topic of interest for researchers and practitioners. Summaries by short quantified sentences of natural language simulate human reasoning, i.e. provide summaries that are not as terse as the numbers. Furthermore, linguistically summarized sentence can be read out by a text-to- speech synthesis system, when the users’ visual attention should not be distracted.
We can say that the linguistic summary is
a more or less accurate textual description (summary) of a data set
This simple definition hides many challenges: construction of fuzzy sets for summarizers, restrictions and quantifiers, selecting appropriate t-norms, sufficient coverage of data, simplicity, usefulness and the like. When LS is of a good quality, we can use results for e.g. generating fuzzy rules and support estimation of missing values.
The tutorial is divided into three main parts.
The main focus is on the classic protoforms divided into basic structures of LS, which express information about particular attributes on the whole database, and into structures with restriction, which express relational knowledge among attributes on the part of a database delimited by the flexible (or sharp) restriction. Other protoforms are outlined.
The high truth value is not always sufficient measure. Hence, quality measures focused on data coverage, simplicity and outliers should not be neglected. All t-norms meet axiomatic properties but differ in algebraic properties. It brings benefits for developing quality measures. Finally, illustrative interfaces for managing summaries and quality measures are touched.
Data are usually stored and used as numbers which pretend to be precise. But real data are frequently not available as precise numbers. When we want to keep information about vagueness, then fuzzy relational meta model is an option. In this way we are able to build lingustic summaries from fuzzy data by possibility and similarity measures, among others.
The main target audience of the tutorial are researchers and practitioners working in fields of fuzzy logic, data mining and business intelligence as well as students searchig topics for ther final exams. Hence, knowledge of fuzzy logic and relational databases is recommended.
The topic of tutorial corresponds with the topics covered by the conference. In the call for paper is stated:
With today’s information overload, it has become increasingly difficult to analyze the huge amounts of data and to generate appropriate management decisions. Furthermore, the data are often imprecise and will include both quantitative and qualitative elements. For these reasons it is important to extend traditional decision making processes by adding intuitive reasoning, human subjectivity and imprecision.
Linguistic summaries simulate human intuitive reasoning when creating abstracts form the observations; are understandable for variety of decision makers contrary to traditional methods; can operate with crisp data, but can be adjusted to work with fuzzy data.
Tutorial focused on fuzzy databases, queries and inferences realized during the Fuzzy Logic & Applications Research Exchange Program between University of Fribourg and University of Economics in Bratislava, November, 2014.
Miroslav Hudec is a researcher and assistant professor at the University of Economics in Bratislava, Faculty of Eco- nomic Informatics. Prior to joining University of Economics in 2013, he was a researcher in the Institute of Informatics and Statistics, Bratislava. He received the Master and PhD degrees from the University of Belgrade. His work is mainly focused on fuzzy logic, knowledge discovery, and information systems. M. Hudec is author or co-author of approximately 45 scientific articles and three books. He is a member of program committees of several international conferences and serves as an editorial board member in Applied Soft Com-
puting journal. He was research leader of two working packages of FP7 project focused on the modernization of data collection and analysis in official statistics. He was the representative of Slovakia on UNECE/Eurostat/OECD Meeting on the Management of Statistical Information Systems in years 2005-2009 and 2013.