Development of filtering software and automatic rating tools for children at school in Japan

 

Akio Kokubu

Electronic Network Consortium and New Media Development Association

 

 

Development of filtering software and automatic rating tools for the harmful content on the Internet based on operational experiences of label bureau with the third party rating will be presented. Efforts to deploy filtering capabilities at school in Japan will be mentioned.

 

Summary

 

Since November 1996, the Electronic Network Consortium (ENC) has been promoting the use of a filtering function as a provision for children against inadequate information on the Internet. Together with the New Media Development Association (NMDA), a nonprofit foundation that functions as its secretariat, the ENC developed a filtering system conforming to the PICS specification and operated a database system (a label bureau complying with the PICS specification) for web page rating for three years. The rating system for the filtering is based on the extension of the RSACi rating system.

 

Following such work, the ENC released the second-generation label bureau and filtering system designed to provide more effectively for children against inadequate information on the Internet and started operations of the new system in May 1999.

 

The features of the system are as follows:

 

• The system features semi-automatic rating tools for the Internet content which effectively rates and labels content through not only the use of keyword analysis of text but also the use of the similar image recognition method for photographs.

 

• The system can link with other label bureaus and convert labels based on different labeling standards. With this function, the labeling coverage percentage on the Internet can be substantially raised.

 

• Development of a server-type filtering system complying with the PICS Rules specification, and provision of the filtering function as a public proxy server on the Internet and its distribution as a free software. These features facilitate the installation of the filtering system for users with a large number of computers, such as for teachers at schools, and for users which use browsers non-compliant with the PICS specification.

 

1.     Background and history

 

The Internet owes its present expansion to open and free use, and today comprises tremendous quantities of information and tens of millions of users throughout the world. Even now it continues to grow from strength to strength. However, by giving easy access to global information, it also allows the circulation of illegal and harmful content (hereinafter called “inadequate content”), causing serious social issues.

The filtering function, by enabling Internet users to receive information selectively in accordance with their requirements, is a user-friendly information system that guarantees the users’ right to know and at the same time protect their children from unwanted information, while respecting the rights and freedom of content providers.

 

(1) Who is the ENC?

The Electronic Network Consortium (ENC) is a trade organization for major Internet service providers in Japan and exists to solve problems which Internet service providers encounter to when they operate services. It is a founding member of the Internet Content Rating Association (ICRA) and an associate member of the World Wide Web Consortium (W3C).

The ENC was established in October 1992 for the purpose of promoting online services in Japan and comprises 73 corporations, including leading Internet service providers, computer manufacturers and software houses, 14 special individual members including experts, and 24 local governments interested in public networks. The New Media Development Association (NMDA), a nonprofit foundation and an auxiliary organization of Ministry of International Trade and Industry, acts as the consortium's secretariat.

Some of work items are:

Ethical guidelines for running services

Recommended etiquette for users

Guidelines for protecting personal data

Filtering software & operation of a label

• Privacy information management systems

 

(2) What did the ENC work on the Internet content?

In order to give a solution to social issues on the Internet content, the ENC has been promoting the provision and dissemination of the PICS-compatible filtering capabilities (http://www.w3.org/PICS/) based on the extension of the internationally adopted RSACi rating system (http://www.icra.org/) and other rating systems since November 1996. The ENC has also been supporting and working cooperatively with Internet service providers to supply filtering software to users and to advocate the self-rating to content providers as a good manner. Together with the New Media Development Association (NMDA), which functions as its secretariat, the ENC developed a filtering system that conforms to the PICS specification and started operations of a database system (a label bureau complying with the PICS specification) for web page rating in September 1997.

At that time, the database of the label bureau exceeded 20,000 URLs by the third party rating and the filtering software installed in personal computers was used by more than 2,000 educational institutions, business enterprises, etc for experimental utilization and for evaluation. The released versions of the filtering software for Windows95 and Macintosh are the PICS compliant and work either on the Netscape or the Internet Explorer. Rating by administrators, that is, teachers or parents, is provided in addition to the third party rating and the self-rating.

The filtering software distributed was free because the project was an experiment for solving problems on the Internet content and for encouraging educators obliged to learn about blocking capabilities. Also distributed were authoring tools for Windows95 and Macintosh to enable self-ratings for content providers.

 

(3) Requirements leading to development of the second-generation system

For the system, the structuring of a database required a page-by-page visual examination of web pages and made it very difficult to cope with the daily increasing quantities of inadequate information. Moreover, the client-type filtering software depended on the browsers used and the different versions of operating systems, and this restricted the environment in which the filtering system could be utilized. In addition, in places where a large number of personal computers were used, such as at schools, the installation and setup of filtering software in the different computers entailed a lot of time-consuming effort for school teachers. Consequently, there was a strong requirement from users for the system that could improve these items.

Improving these items and creating a more effective provision to inadequate information was the aim of development on the second-generation label bureau and filtering system. The system consists of the label bureau with semi-automatic rating tools and the server-type filtering system.

 

2.     The label bureau with semi-automatic rating tools

 

(1) The label bureau compliant with the PICS specification

The label bureau is configured with server software that functions in compliance with the PICS specified in the World Wide Web Consortium (W3C). Also included in the configuration is a database that comprises URLs (Uniform Resource Locator: Web page address on the Internet) that are rated and labeled according to the rating system. The label bureau is also equipped with a cooperation function with search engines, including clipping of retrieved data at search engines and robot-based retrieval. The bureau moreover has a GUI (Graphical User Interface) for label registration, retrieval, renewal, and deleting. These functions will facilitate label bureau creation and management.

 

(2) Rating system based upon the SafetyOnline

The rating system for adding a label is based upon the extension of RSACi and the ENC calls it SafetyOnline. The system tentatively includes an "Extension" category so that new items such as drugs and gambling may be added to extend coverage. The figure ranging 0 to 4 in each category represents its rating value. The file that describes this rating system has been distributed on the Internet.

When it comes to actual rating procedures, instances may occur where ratings are difficult to determine. A more detailed instruction manual has been prepared by referring to the self-regulated rating systems of the Japanese package media industry. Furthermore, for cases that might bring about two totally opposite evaluations (for example, some might appreciate a nude photo as art while others might despise it as obscene), no rating will be made from such subjective impressions, but rather through the objective procedure of measuring the extent of body exposure.

 

(3) The database construction with semi-automatic rating tools

The database is constructed by applying the semi-automatic rating tools to each page obtained from the result retrieved from major search engines in Japan.

Two groups of major keywords (in Japanese and in English) have been picked up for retrieving:

(2) a keyword group concerning sex such as "adult," "sex," "nude," "vice," "Lolita," or "SM," which are usually included in the top twenty key words for search at search engines

(3) a keyword group concerning violence such as "violence," "murder," or "corpse," which are rarely included in the top keywords but are considered to affect juveniles.

 

The semi-automatic rating tools utilize not only keyword analysis of text but also similar image recognition method for photographs to carry out effective rating and labeling work. Textual information contained in web pages is referred to weighted keyword lists classified by category (presently, inadequate words number about 6,000 and relevant words about 26,000) and analyzed to determine a rating value. It is possible to raise the degree of automation by renewing the keyword lists through visual confirmation work.

In order to effectively rate images on web pages, a rating value is calculated on the basis of a search with a similar image database (currently about 2,000 images) and then image information is “thumb-nailed” (the size of an original image is reduced in order to improve a glance). Finally a rating value is determined effectively with the calculated rating values and thumb-nailed images by visual examination. The quality of automatic rating can be improved by adding more images to the database.

In order to prevent obsolescence of the label database, URLs corresponding to label information stored in the database are accessed regularly to examine the dates of renewal and changes in size, and if there is a change, such URLs can be made objects of the automatic rating tools.

 

(4) Linkage functions between other label bureaus

It is not possible for a single label bureau to cover the enormous number of web pages on the Internet throughout the world. If an inquiry is made about the rating of a web page, the label bureau operated by the ENC can respond to the inquiry by referring to other label bureau complying with the PICS specification in case it does not have data on the rating value of the web page. Moreover, it features a function for converting labels using a different rating standard. Such cooperation capability with other label bureaus will become important in the near future in the viewpoint of the international cooperation.

 

3.     The server-type filtering system

 

To meet the needs of schools and relatively large organizations, the second- generation filtering system is a server-type filtering system (SFS) which facilitates installation and use with large numbers of personal computers at schools, and for users of browsers which are not compliant with the PICS specification. The main features of the filtering system are as follows.

 

(1) Filtering by a proxy server software

•  To enable schools and relatively large organizations to implement filtering capability easily, filtering is done by a proxy server software. With this system, there is no need to set up and manage the filtering software separately for each PC.

•  The SFS is described by the Java language, and can be operated with any computer environment provided with the Java Development Kit (JDK). It has been confirmed that it operates on the Linux, Solaris and WindowsNT. (It will operate on Windows 98.)

•  The SFS responds not only to the viewing of web pages but also to the file transfer and Newsgroups.

 

(2) Provision of a profile bureau for sharing and installing profiles

It is necessary for access administrators, such as teachers and parents, to set profiles for describing the filtering values and policy as the SFS conforms to the PICS Rules specification defined by the W3C (http://www.w3.org/TR/REC-PICSRules/). This description work, however, is not easy because it is a kind of programming. In view of this, a profile bureau for sharing and installing profiles is provided in addition to Profile Editors. The profile bureau is provided with profile models classified by age and application, which access administrators can download for their use.

 

(3) Rating by administrators, third parties and content providers

The SFS permits rating by the following three raters: (a) rating by SFS administrators (for instance, teachers), (b) rating by third parties (for instance, the ENC’s label bureau), and (c) rating by content providers themselves (for instance, the playboy.com’s page). In case two or more ratings exist, they are adopted in the order of priority: (a), (b), and (c).

 

4.     Current status and future plans

 

(1) Expansion of label database and assistance in constructing label bureaus

The total number of web pages worldwide is estimated to number more than 100 million. It is therefore impossible to pick up and block every one of the web pages that might include inadequate content. The database's construction, however, is considered to assure blocking of most of the objectionable content offered in Japanese that are accessible via normal retrieval.

The ENC is expanding the scale of its database by regularly rating inadequate information through the use of the developed semi-automatic rating function.. With the system, a blacklist of more than 440,000 URLs (Uniform Resource Locators, i.e., web page addresses) has been integrated to block out inadequate materials on the Internet for children at school and at home.

Rating itself depends on the subjectivity of human values and is based on a standard derived from a subjective system of values. It is desirable that there will be several label bureaus that follow different subjective systems of values, and there is a possibility that private organizations in Japan will launch label bureaus in the future. The ENC, by supplying software and other technical assistance, will make continued efforts so that many label bureaus based on different systems of values can be established and provide services.

 

(2) Provision of a public SFS and the SFS as free software

A public server-type filtering system (SFS) is provided as a proxy server on the Internet (http://pops.pics.enc.or.jp:8180) for browsers which are not compliant with the PICS specification for the third party rating. (Browsers compliant with the PICS specification for the third party rating such as the Internet Explorer can be connected to the label bureau directly without the SFS.)

In addition, to accelerate the widespread use of this system in educational institutions and other organizations which have an urgent need for it, the ENC is distributing the SFS software that is supposed to be used as a proxy server and tools (redirectors) that are designed to prevent a change in the proxy setup for browsers as free software (http://www.nmda.or.jp/enc/rating/eindex.html).

 

(3) Linkage between other label bureaus and the international cooperation

As the quantities of information on the Internet are tremendously large, it is impossible for a single label bureau to rate information throughout the world. It is therefore necessary in the future to establish an international distribution system as in the domain name system. For this purpose, the ENC will call on overseas label bureaus for cooperation.

Furthermore, it is necessary to develop a common international rating standard. The Internet Content Rating Association (ICRA), which inherited the intellectual property of the Recreational Software Advisory Council (RSAC), is preparing a global rating standard. It will permit objective content description free from cultural values of particular countries and which will enable users in different countries to apply their values in selecting content on the Internet. As a founding member of the ICRA, the ENC will endeavor to contribute to the establishment of a rating standard not only for Japan but also for other countries in the Asian region.