Extended Abstract
The main goal of this talk is to put together theoretical results on intermediate quantifiers which were proposed in
several papers (see e.g. [1, 2, 3, 4]) with the Fuzzy GUHA method [5], and to introduce a linguistic characterization
of natural data using generalized intermediate quantifiers. The theory of intermediate quantifiers was introduced by
Nov´ak in [3] and now is a constituent of the theory of Fuzzy Natural Logic (FNL), which is a mathematical counterpart
of the concept of Natural Logic introduced by Lakoff [6]. This theory is based on Łukasiewicz fuzzy type theory (Ł-
FTT) [4], which is one of the existing higher-order fuzzy logics.
Fuzzy GUHA is a special method for automated search of association rules from numerical data. Generally,
obtained associations are in the form A s B, which means that the occurrence of A is associated with the occurrence
of B, where A and B are formulae created from objects’ attributes. As proposed by H´ajek et al. [5], the original GUHA
method allowed only boolean attributes to be involved. Some parts of their approach was independently re-invented
by Agrawal [7] many years later and is also known as the mining of association rules or market basket analysis.
A detailed book on the GUHA method is [8], where one can find distinct statistically approved associations between
attributes of given objects. Fuzzy GUHA is an extension of a classical GUHA method for fuzzy data. In this paper,
we work with associations in the form of IF-THEN rules composed of evaluative linguistic expressions, which allow
the quantities to be characterized with vague linguistic terms such as “very small”, “big”, “medium” etc.
To measure the interestingness of a rule, many numerical characteristics or indices have been proposed (see [9, 10]
for a nice overview). As a supplement to them, we try to utilize the theory of intermediate quantifiers to characterize
the intensity of association, which allows us to use linguistic characterizations such as “almost all”, “most”, “some”,
or “few”. As a result, we may automatically obtain the following sentences from numerical bio-statistical data:
Almost all people, who suffer atopic tetter, live in an area affected by heavy industry and smoke, suffer from
asthma.
Most people who smoke and suffer from respiratory diseases also suffer from ischemic disease of leg.
In the practice, it is often the case that some data are not available e.g. due the error in measures, missing results,
or if the respondent is not willing to answer or has no opinion on the given subject. We can completely remove the
cases with missing values to obtain clean data, but it can result in an excessive loss of information. Alternatively,
we can handle missing values by using fuzzy partial logics, which were proposed by Bˇehounek and Nov´ak in [11].
They provide formal apparatus for several types of missing information such as “unknown” or “undefined” (i.e. not
meaningful) value. Basically, the semantics of these logics formed by algebras of truth values is extended by a special
value “”.