In this study, we propose a text mining approach applied to
detect suspicious words and analyze information from data collection to
visualization. The approach follows the steps of data collection, natural
language processing, text analysis and ontology and presentation. We focus more
on the development of domain-based ontology because it performs the function of
text analysis, such as extractions and classifications of topics.
First, the step in the proposed approach to text mining is
to collect various data from web application sources, such as online content,
social network data. There are many methods of data collection that are
considered according to the types of data that must be collected. We use these
types of data collection along with the open data API, the database interface,
and the manual submission.
The crawler robot, a kind of search engine on the web portal
site, extracts the URLs to which it points. It places the extracted URLs in a
queue. Open API is easy to understand a protocol to access a database system
and capture data. Especially, the social network service such as Facebook and
Twitter provides its API to access and collect its database.
Developments in natural language processing and ontology
After the data collection, check the condition of the
collected data and process it to the next stage. This stage of cleaning the
collected data may have too much garbage to analyze, so we must filter them.
The filtering data is preprocessed in the text mining approach. In this work,
natural language processing for data cleansing is performed by describing text
sentences, eliminating empty and deactivated words (ie html tags, punctuation
marks, numbers and emoticons) and transforming accounting datasets such as the
matrix of element documents or the list of elements.
Text mining involves language resources, such as taxonomy,
ontology and the dictionary of feelings, to obtain more accurate and efficient
results. The process of ontology development is to help analyze text and
multimedia data about specific knowledge of domain, opinion of emotions,
thought, etc. In this aspect our proposed methodology that explains in 5 steps
for development of domain-specific ontology concludes determining scope, considering
recycling, extracting terms from data sources, defining taxonomy, validating
the preliminary ontology.
The following stage. There are different text analysis, for example,
domain extraction, classification, clustering, sentiment analysis, time
arrangement analysis. For example, domain
and buzz analysis are for the most part related with domain interesting issues
and society. Suspicious message detection is a technique utilizing text
classification consideration, for example, lead based framework or machine
Sentiment analysis tries to read the particular
mind, emotion and contemplated items about people events or product services,
with help of sentiment dictionary or machine learning approach. However
language sources are imperative in any way. Statistics is functional of
characters the dataset compare figures in the same collection and other revel association
among variables and calculate the future