CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering

عنوان مقاله: Using an Automatic Weighted Keywords Dictionary for Intelligent Web Content Filtering
شناسه ملی مقاله: JR_JACR-6-1_008
منتشر شده در شماره 1 دوره 6 فصل Winter در سال 1393
مشخصات نویسندگان مقاله:

Najibeh Farzi Veijouyeh - Islamic Azad University of Shabestar Branch, Shabestar, Iran
Jamshid Bagherzadeh - Assistant professor, Computer Science and Eng. Deptt, Urmia University, Urmia, Iran

خلاصه مقاله:
Filtering of web pages with inappropriate contents is one of the major issues in the field of intelligent network's security. Having a good intelligent filtering method with high accuracy and speed is needed for any country in order to control users' access to the web. So, it has been considered by many researchers. Presenting web pages in an understandable way by machines is one of the most important preprocessing steps. Thus, offering a way to describe web pages with lower dimensions would be very effective, especially in determining the nature of web pages with respect to whether they should be filtered out or not. In this paper, we propose an automatic method to detect forbidden keywords from web pages. Next, we define a new representation of web pages in vector form which consists of weighted sum and frequency of forbidden keywords in different parts of web pages named RWSF. For this, a ranking dictionary of keywords including forbidden keywords is used. To evaluate the proposed method, 2643 pages consisting of 1311 normal pages and 1332 forbidden pages were used. Among these, 1851 pages were used to train the system and 792 pages were used for system evaluation. The system has been assessed using various classifiers such as: k-Nearest Neighbor, Support Vector Machines, Decision Tree and Artificial Neural Networks. Evaluation results indicate the high efficiency and accuracy of the proposed method in all classifiers.

کلمات کلیدی:
Content based filtering, Forbidden keywords extraction, Ranking keywords, Web page representation

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/488459/