You are here

Full name:KOPI Online Plagiarism Search and Information Portal
Start date:2003. 03. 03.
End date:2004. 28. 02.
Participants:
Cost:16 million HUF conributed by [[http://en.ihm.gov.hu/ IHM]]
Project homepage:http://dsd.sztaki.hu/projects/past/kopi/en/

1. The goal of the project and the use of the results after finishing the project

The goal of the project is to develop an online plagiarism-search portal that helps both digital libraries to protect their documents, and teachers, professors to find copied work or publications. The portal would also give information about the Hungarian legal issues corresponding to this special field, and would include a discussion forum as well.

Such a service is not available for the Hungarian net community yet, and the foreign services are also limited in number and functionality. This portal will foster Internet publications and spreading of digital libraries by beating back the illegal copies. It makes no sense to copy a digital document if in minutes the copying of the whole or part of the document can be detected.

Of course, plagiarism-search would constitute the main part of this free portal, but some other services would be included as well:

  • Comparing uploaded set of documents with itself.
  • Rules of law
  • A forum

1.1. Description of the problem

In the time of the information society it is becoming more and more important to protect data, the base of our knowledge. One really effective way to find similar documents is the plagiarism-search.

With the evolution of the computer technology/science the process of creating written essays is easier than ever and their publication became fast and easy as well. The invention of the World Wide Web is usually compared to that of printing by Guttenberg, because with the help of the Internet the publication of an essay became easy, fast and really cheap. So the biggest collection of intellectual works ever seen came into life.

On the other side digital data storage extremely simplifies the copying of these essays or parts of them, therefore it simplifies plagiarism. Naturally, the owners of intellectual works (pictures, music, publications...) are cautious about publicizing their works in digital libraries or on the Internet.

There is an other important field where a plagiarism-search could be used, namely education, mostly in university education where more and more theses, publications are copied from digital sources. These things destroy the fame of a given school and even the value of the diploma as well.

1.2. The academic, social, technical and economic aims of the project

For the project the text-comparing algorithms, the database structures and queries need to be researched and developed, especially the runtime system needs to be optimized for big databases. As Hungary is about to join the European Union an other important aspect is multilingualism. All algorithms need to be implemented language independent to support the search in texts written in any language.

An efficient plagiarism detection system could tell within minutes whether a text is an original one or a plagiarism. If such a system would be accessible for all people for free, it would incredibly increase the risk of discovering it, so it could decrease the number of plagiarism.

The two most important fields for document comparison are digital libraries and higher education (Research and development), but they are not independent of each other so one common service could help in both areas. What are the requirements of people working in these two areas? Digital libraries want to protect their documents against plagiarism, while in the higher education and research field the copies of the documents stored at the digital libraries cause the problem. That means, a solution in which the documents from the digital libraries would be registered automatically using mobile agents and to which everybody else could compare documents would serve both parties. The program should implement a world wide used standard interface, not requiring digital libraries to invest too much in implementing their interface.

The only way to convince a wide range of people to use the system is to make it easy to use. This can be achieved using an architecture that completely hides the internal mechanisms and shows a user interface that can be understood and used with basic computer skills.

Easy and fast use is a really important aspect of the system because the documents as information sources are very precious in Information Society. If the protection of documents is not available for everybody, the growth of electronic publications could stop or even decrease. Another aspect that is vital for the success of the proposed system is that the planned portal should be totally free, especially taking into account the low Internet penetration in Hungary and the unwillingness of users to pay for such services on the Internet.

1.3. The innovative parts of the project

Goal of the project is to test and develop the three main steps of document comparison:

  1. chunking
  2. compressing
  3. database management

These tests are important for the program. Similar initiations would also gain on the project, because the results of the tests will be published and free access able to everyone.

The Hungarian net community would also gain a - hopefully widely used - portal, which would fight plagiarism and help to spread digital libraries and online publications. People in the education could also use this service and people organizing a conference could also detect stolen works.

Learn more about plagiarism detection