Abstract – The first part of this paper give a general view ofKnowledge Discovery in Database. The information system contains redundant attributes that will not aid Knowledge Discovery and mayin fact mislead the process. The tools -Rough Set and the Soft Set, deals withuncertainty of the information system in data mining.

In thesecond part the basic definitions in Rough and Soft Sets are included and atlast a decision making problem isaddressed. The problem is represented in the form of an information table andthen the reduction of the knowledge representation system in Rough Set theoryto define the Reduct-Soft Set of a Soft Set is employed. An algorithm isincluded to select an optimal choice object. The algorithm uses fewerparameters to select the optimal objects for a decision making problem. Keywords: Data mining, Soft Set, Reduct Soft Set, Rough Set, KnowledgeDiscovery in Databases 1. INTRODUCTION Datamining refers to extracting or “mining” knowledge (hidden information) fromlarge amounts of data. Data mining treats as synonym for another popularly usedterm, Knowledge Discovery in Databases or KDD 1.

KDD is the nontrivialprocess of identifying valid, novel, potentially useful and ultimatelyunderstandable patterns in data. Thegoal of KDD is to extract meaningful knowledge from the database. The key stepsin Knowledge Discovery cycle are Data cleaning, Data integration, Dataselection, Data transformation, Data mining, Pattern Evaluation, Knowledgepresentation. Data mining is not asingle technique, some other commonly used techniques are : statisticalmethods, case-based reasoning (CBR), Bayesian Belief Networks (BBN ) etc.

,. 1.1. SOFT SET Inpractice, the great deal of data involved in economy, engineering, environment,social science and medicine is not always vivid and these data include allkinds of uncertainty. But the most tools for modeling, reasoning andcalculation are certain or precise, which deal with certain problems ineconomy, engineering, environment and so on. At present there are threetheories considered as mathematical tools for dealing with uncertainties:theory of probability, theory of fuzzy sets, and theory of intervalmathematics. However, the three theories have respectively shortage. Theoryof probability based on full samples need to make a great lot of experiments inorder to gain check samples.

In practice, experiments on economy are difficultto do, not like engineering experiments. Theory of interval mathematics takes intoaccount calculation error and sets up an interval estimation when need precisesolutions. But theory of interval mathematics can’t well describe the smoothchange of information, uncertain information and contradiction information.Theory of fuzzy sets is fittest to describe fuzzy information, which is firstlyput forward by L. A. Zadeh. Fuzzy sets have great deal of an application inmany fields, but there is a piece of difficulty that how to establishappropriate subject function.

Since the character of subject function isextreme individuation, there is no uniform ways to set up subject function. Forexample every one understands F (x) = 0.7 by own way. Thereason why there exists deficient as above is lack of the theory of expressingparameters. The tools for making sure parameters are so poor that uncertaintyof parameters becomes the bottleneck of using these theories. To solve thisproblem, 1999 D.

Molodtsov set up the basic theory of soft sets which can dealwith uncertain fuzzy, unclear information. 1.2. ROUGH SET Themethod was first introduced by Zdzislaw Pawlak in 1982. Rough set analysis is afirst step in analyzing incomplete or uncertain information. Rough Set analysisuses only internal information and does not rely on additional modelassumptions as Fuzzy Set methods or probabilistic methods do.

The Rough Set methodologyprovides definitions and methods for finding which attributes separates oneclass or classification from another. Since inconsistencies are allowed andmembership in a set does not have to be absolute, the potential for handlingnoise gracefully is big. The results from a training phase when using the RoughSets approach will usually be a set of propositional rules which may be said tohave syntactic and semantic simplicity for a human. How the problems of dynamicdatabases and time and memory constraints are dealt with will be different foreach system using the Rough Set approach, but typically the time complexitywill be high and perhaps polynomial.

TheRough Sets 7 is one of the techniques for the identification and recognitionof common patterns in data, especially in the case of uncertain and incompletedata. The mathematical foundations of this method are based on the setapproximation of the classification space. Rough set theory was initially developed 7for a finite universe of discourse in which the knowledge base is a partition,which is obtained by any equivalence relation defined on the universe of discourse.In Rough Sets theory, the data is organized in a table called decision table.

Rows of the decision table correspond to objects, and columns correspond toattributes. In the data set, a class label to indicate the class to which eachrow belongs. The class label is called as decision attribute, the rest of theattributes are the condition attributes. Here, C is used to denote thecondition attributes, D for decision attributes, where C Ç D = ?, and tjdenotes the jth tuple of thedata table. Rough Sets theory defines three regions based on the equivalentclasses induced by the attribute values: lower approximation, upperapproximation, and boundary. Lower approximation contains all the objects,which are classified surely based on the data collected, and upperapproximation contains all the objects, which can be classified probably, whilethe boundary is the difference between the upper approximation and the lowerapproximation 8. Withinthe frame work of Rough Sets the term “classification ” describes thesubdivision of the universal set of all possible categories into a number ofdistinguishable categories called elementary sets.

Each elementary set can beregarded as a rule describing the object of the classification. Each object isthen classified using the elementary set of features which cannot be split upany further, although other elementary sets of features may exist(Reductsystem, 1993). In the Rough Set model the classification knowledge isrepresented by an equivalence relation IND defined on a certain universe ofobjects U. The pair of the universe objects U and the associated equivalencerelation IND forms an approximation space. The approximation space gives anapproximation description on any subset x of U.

2. PRELIMINARIES Definitionof Soft setLetU be an initial universal set and let E be a set of parameters. DEFINITION2.1 ( See 6.) A pair (F, E) is called a soft set over U if and only if F is amapping of E into the set of all subsets of the set U .i.

e. F: E , where P(U) is thepower set of U. Inother words, the soft set is a parameterizedfamily of subsets of the set U. Every set F(), for E.

from the familymay be considered as the set of elements of the softset (F,E) , or as the set of approximate elements of the soft set. Asan illustration, let us consider the following examples (1) A soft set (F, E) describes the features of thecomputer which Mr. X is going to buy. U = the set of different computer models underconstructionE = the set of parameters .Each parameter is a word ora sentence.E = {{expensive; cheap; high speed processor; normalspeed processor; multimedia keyboard; LCD ; CRT; free annual maintenancecontract}.

In this case to define a soft set means point out cheap ,high speed processor and so on . Itsworth noting that the sets F()may be empty for some E.(2) Zadeh ‘s fuzzyset may be considered as a special case of the soft set. Let A be A fuzzy set of U with membership ,i.e., is a mapping of U into 0,1. Let us consider the family of -level sets for the function given by F() = { () }, 0,1. If we know the family F, we can find the functions () by means of thefollowing formulae: () = sup ().

0,1. F() Thus , every Zadeh ‘sFuzzy set A may be considered as the soft set (F, 0,1). (3) Let (X,) be a Topological space, that is ,X is a set and is a Topology , inother words, is a family of subsets of X , called the open sets of X. Then, the familyof open neighbourhoods T of point, where T = {V: V}, may be considered as the soft set (T,). The way ofsetting ( or describing ) any object in the soft set theory principally differsfrom the way in which we use classical mathematics . In classical mathematics, weconstruct a mathematical model of a object and define the notion of the exactsolution of this model . Usually the mathematical model is so complicated andwe cannot find the exact solution .

So, in the second step, we introduce thenotion of approximate solution and calculate the solution In the softset theory we have the opposite approach to this problem. The initialdescription of the object has an approximate nature, and we do not need tointroduce the notion of exact solution The absenceof any restrictions on the appropriate description in soft set theory makesthis theory very convenient and easily applicable in practice. We can use anyparameterization we prefer: with the help of words and sentences , realnumbers, functions, mappings, and so on .It means that the problem of settingthe membership function or any similar problem does nit arise in the soft settheory DEFINITION2.2(See 7) A knowledge representation system can be formulated as follows : Knowledgerepresentation system is a pair S = (U,A), where U = a nonempty, finite setcalled the universe. A = a nonempty ,finite set ofprimitive attributes. Everyprimitive attribute a A is a total function U, Where is the set of values of , called the domain of . DEFINITION2.

3 (see 7)With every subset of attributes B A, we associate binaryrelation IND(B) called an indiscernibility relation , defined by IND(B) = { every }. Obviously, IND(B) is an equivalent relation and Everysubset B A will be called anattribute . If B is a single element set, then B is called primitive , otherwisethe attribute is said to be compound. Attribute B may be considered as a nameof the relation IND(B) , or in other words – as a name of knowledge representedby an equivalence relation IND(B). DEFINITION2.4 (See 7) Let be a family ofequivalence relations and let A R.

We will say that Ais dispensable in R if IND(R) = IND(R – {A}); otherwise A is indispensable inP. The family R is independent if each A R is indispensable inR; otherwise R is dependent. IfR is independent and P R. then P is alsoindependent.Q P is a reduct of P ifQ is independent and IND(Q) = IND(P). Intuitively, a reduct of knowledge is essential part, which suffices to define all basic concepts occurring in theconsidered knowledge ,whereas the core is in a certain sense its most importantpart . The set of all indispensable relations in P will be called the core of P , and will be denotedCORE(P) . Clearly CORE(P) = RED (P).

where RED (P)is the family of all reducts of P. Thenuse of the concept of the core is twofold. First it can be used as a basis forcomputation of all reducts, for the core is included in every reduct , and itscomputations straightforward. Secondly ,the core can be interpreted as the setof the most characteristic part of knowledge ,which cannot beeliminated when reducing the knowledge.

DEFINITION2.5 . DEPENDENCY OF KNOWLEDGE (see 7.) Formally the dependency can be defined as shown below: Let K = (U, R) be a knowledge base and let P,Q R. 1. Knowledge Q depends on knowledge P if IND(P) IND(Q).2. Knowledge P and Q are equivalent denoted as P Q, if P Q and Q P.

3. Knowledge P and Q are independent ,denoted as P ? Q , if Neither P Q nor Q P hold. Obviously , P Q, if IND(P) = IND(Q). DEFINITION 2.6 (See 5 .

) For two soft sets (F,A) and (G, B) over a common universe U, wesay that (F,A) is a soft subset of(G,B)if(i) A B, and(ii) A, F(). And G() are identical approximations. We write (F,A) (G,B) .(F,A) is said to be a soft super set of (G,B) , if(G,B) is a soft subset of (F,A). We denote it by (F,A) (G,B). 3. AN ILLUSTRATION FOR THE TOOLS DEALING WITHUNCERTAINTY Molodstov 6 presented some applications of the softset theory in several directions viz.,study of smoothness of functions game theory,operations research, Riemann-integration, Perron integration, probability ,theory of measurement etc.

In this section we present an application ofsoft set theory in a decision making problem with the help of rough approach7.The problem we consider asbelow. Let U = {m1,m2,m3,m4,m5,m6} , be a set of six models of computers ,E = {expensive; cheap; highspeed processor; normal speed processor; multimedia keyboard; LCD ; CRT; freeannual maintenance contract}, be a set of parameters. Consider the soft set (F,E) which describes the ‘features of the computers’ ,given by (F,E)= {expensive computers={m1,m4},cheap = {m2,m3,,m5,m6 }, highspeed processor ={m1,,m4,m5,m6}normalspeed processor = {m2,m3, }. multimediakey board= {m1,m4,m5,m6 }, LCD={m1,m2,,m4,m5,m6 },CRT= {m3,,m6 }, free annual maintenancecontract= { m1,m2,,m4,,m6 }}.

Suppose that , Mr. X is interested to buy a computeron the basis of his choice parameters ‘cheap; high speed processor; multimediakeyboard; LCD ;free annual maintenance contract’ etc., which constitute thesubset P = { cheap; high speedprocessor; multimedia keyboard; LCD; free annual maintenance contract } of theset E . That means, out of available models in U , he is to select the computermodel which qualifies with all (or with maximum number of) parameters of thesoft set P. Suppose that, another customer Mr.

Y wants to buy amodel on the basis of the sets of choice parameters Q E and Mr. Z wants tobuy a model on the basis of another set of parameters R E. The problem is to select model which is most suitablewith choice parameters of Mr. X. The model which is most suitable for Mr. X , neednot be most suitable for Mr.

Y or MR. Z as the selection is dependent upon the set of choice parameters of eachbuyer. To solve the problem , we do some theoreticalcharacterizations of the soft set theory of Molodtsov, which we present below. 3.1 Tabular representation of a soft setTabular representation of soft sets were done by Lin3and Yao 9 earlier. We present an almost analogous representation in the formof binary table . For this consider the soft set (F,P) above on the basis ofthe set P of choice parameters of Mr.

.X .We can represent this soft set in atabular form as shown below . This style of representation will be useful forstoring a soft set in a computer memory . If mi F() then mij= 1. otherwise mij = 0 where mij arethe entries in Table 1 Table1.

U e1 e2 e3 e4 e5 m1 m2 m3 m4 m5 m6 0 1 1 1 1 1 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 Thus soft set now can be viewed as a knowledgerepresentation system (Definition 2.2) where the set of attributes is to bereplaced by a set of parameters. 3.

2 Reduct – Table of a Soft Set Consider theSoft set (F,E). Clearly for any P E. (F, P ) is a softsubset of (F,E) . We will now define reduct-soft –set of the soft set (F,P). Consider the tabular representation of the soft set (F,P). If Q is a reduct of P, then the soft set (F,Q) is called the reuduct-soft-set of the soft set (F,P). Intuitively , a reduct –soft-set (F,Q) of the soft set(F,P) is the essential part, which suffices to describe al basic approximatedescriptions of the soft set (F,P). The core soft set of (F,P) is the soft set (F,C) .

where C is the CORE(P). 3.3 Choice value of an object miThe choice value of an object mi U is Ci given by Ci=ij; Where hij are the entries in the table ofthe reduct-soft-set. 3.4 Algorithm for Selection of the Computer The following algorithm may be followed by MR. X toselect the model of computer he wishes to buy1. input the soft set (F,E),2.

input the set P of choice parameters of Mr. X which isa subset of E.3. find all reduct –soft-sets of (F,P).4. Choose one reduct-soft-set say (F,Q) of (F,P).5.

find K, for which Ck= max ci .Then mk is the optimal choice object. If k has more than one value, then anyone of them could be chosen by Mr. X by using his option. Now we use the algorithm to solve our originalproblem.

Clearly from the table we see that (e1,e3,e4,e5},{ e1,e2,e4,e5},{ e1,e2,e3,,e5}. are the three reducts of P = { e1,e2,e3,e4,e5}. Choose any one say Q = { e1,e2,,e4,e5}. Incorporating the choice values, the reduct –soft-setcan be represented in Table 2 below. Table2. U e1 e2 e4 e5 Choice value m1 m2 m3 m4 m5 m6 0 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 C1 = 3 C2 = 3 C3 = 1 C4 = 3 C5 = 3 C6 = 4 Here max Ci = C6.Decision :Mr x is better to buy the model m6 4.

CONCLUSION The Soft Settheory of Molodtsov 6 and Rough Technique of Pawlak 8 offer a a generaltool for dealing with uncertaininformation. In this paper, we gave anapplication of Soft Set theory in a decision making problem by the roughtechnique of Pawlak8.