Sriram Rajagopalan
Sriram Rajagopalan has experience in facilitating design, development, and maintenance of large-scale Learning Management Systems and Content Management Systems for various content-intensive projects. With expertise in both technical and design aspect of the projects, he has executed several conversion and migration projects.


    October 21, 2020

    “Official” data of how much content is plagiarized around the world is varied, as the meaning of the term “plagiarism” itself is not commonly agreed upon. Even with the limited amount of statistics available, plagiarism is an indisputable threat to the IP of the original writers. As the technology around us is growing, the means through which information is shared has grown in more ways than we know. With technology advancements, not only have instances of plagiarism increased, but the tools available to combat plagiarism have likewise grown. This article talks about various means in which original content is plagiarized and the tools developed by Lumina Datamatics to support our clients in shielding their content.

    What Is Plagiarism?

    Most people know that copying word-by-word is plagiarism, but some people may not even be aware—or don’t want to admit—that the reuse of content without providing proper attribution to the original user is considered plagiarism as well. That is, plagiarism is not just duplicating work created by somebody else. Failing to cite references when content is re-sourced and rewriting in a different approach are also considered to be intentional plagiarism. Most intentional plagiarism is found in student communities while they answer their assessments and prepare their research work. A study conducted by reports, “A poll conducted by US News and World Reports found that 90% of students believe that cheaters are either never caught or have never been appropriately disciplined.” Another form of plagiarism that is prevalent in educational writing or online courses involves writing about an idea without realizing somebody else has already written similar content–while unintentional, this is still plagiarism. By any means, plagiarism is a legal offense and damages the reputation of the author and the publisher.

    Penalties for Plagiarism

    Apart from the academic penalties, failing to comply with the rules of plagiarism is considered illegal resulting in prosecution and punishment, depending upon the severity. The owner of the copyright could initiate a legal proceeding in federal court for violating copyright law. According to the federal laws in US, any work created and published after 1st March 1989 is protected by copyright law, even if a copyright notice is not attached. Beyond legal repercussions, the fallout of such cases causes irreparable damage to the professional reputations of the institutions or individuals publishing the content.

    How Is Your Content Plagiarized?

    You, as the content provider, have to be vigilant about anything that’s submitted by the author, checking it carefully for intentional and unintentional plagiarism before publishing it under your name. Secondly, as the content is the basis by which your revenue is generated, it is important to keep monitoring to see if the content is re-distributed illegally after publication. Here are two interesting case studies (names changed for privacy) of recent note:

    Case Study 1

    Andrew Mason, a freshman in college taking an algebra course, was trying to solve f(x) = x– 6x + 5 as a part of his assignment. His teacher, Lisa, was surprised not with Andy’s ability to work out a solution, but that the problem was solved using calculus derivatives he hadn’t yet been taught. When trying to resolve the mystery, Lisa found a bunch of homework websites and smartphone apps where students can upload images of solved problems.

    If this happens for homework, is it possible that other educational content (on which thousands of dollars are spent for writing and publishing) is similarly available? The shocking answer for this is, yes! Every time content is published, there is a small percentage of people who leak it online, which results in floating the content freely.

    Case Study 2

    John Mackenzie, a reputed author in atomic physics who has been authoring for over 20 years, received a legal notice warning him about a similarity of his content to that of another publication, which he wasn’t even aware of. John seemed to have written this content, without even being aware of similar content already published.

    Combating Plagiarism

    Though in the first case study, the plagiarism was intentional and in the second, unintentional, both are illegal and can be penalized. But how can publishers and educational providers realistically check all newly authored or revised content? Manual checks prove impossible, due to the massive amount of content being released or updated every year. The simple answer is that they can’t. No one can.

    Enter automation. By leveraging our industry experience and technology, Lumina has designed a system to handle instances like these without relying on laborious, time-consuming manual checks.

    Our plagiarism checks are executed as content writers create their work either within a content management system (CMS) or when content that is authored in MS-Word or other text processing software is ingested into a CMS. The submitted content is matched against published materials across the internet to report any similarities. This helps in arresting the issue immediately, before the content moves to the next stage. This service is easily integrated into a publisher’s CMS, with checks done seamlessly in the background without adding additional time to the schedule.

    But preventing authors from intentionally or unintentionally plagiarizing is only one side of the fight. Once unique content is published, the content owners themselves are at risk of exposure and of having their own content plagiarized and reused without permission!

    Luckily, Lumina’s anti-plagiarism engine is not only a pre-publication tool. Rather, it can provide protection for published content too, by way of regular, programmed scans of the internet in frequent intervals and the generation of reports flagging similarities between the original content and instances shared online. Project Managers can then contact the plagiarizing website owners and issue a takedown notice.

    Lumina recently worked with a major higher educational publisher to identify how their assessment content had been leaking across several ghostwriting websites. Though the content on the institution’s website was encrypted in databases on their server and allowed only to students who were enrolled in the courses, a significant amount of content was found illegally published on homework websites. After an audit for this process leak, Lumina helped to discover that some authorized student users had taken cellphone photos of the assessment content and published it across the ghostwriting websites. These sites operate with SMEs who help solve assessment questions and publish the solutions. When this content was highlighted by Lumina’s plagiarism check, we immediately brought this to the attention of the publisher and helped them raise a take-down notice to the offending websites. In this particular case, the plagiarism audit system had been custom-designed to run at specific time intervals to generate reports on plagiarized content, and also to identify the titles in which the content was present within the publisher’s repository (to prove the ownership for the publisher). But this isn’t the only set up possible—our system can be customized to fit the nuanced needs of each client.

    How Does It Work?

    Ghostwriting websites and other online entities want traffic to their sites, and therefore they want their content discoverable with Google searches. For their content to appear in Google searches, websites submit periodic requests to Googlebot (a web-crawler software), which crawls and indexes the Google search engine. Once the content is harvested by Google databases, it will appear in user searches. Lumina’s plagiarism engine communicates with Google to find locations where original content is re-published. The plagiarism engine is also programmed to handle valid sites (exceptions) which are allowed to publish the content (such as websites owned and operated directly by the publisher or its affiliates).

    The percentage of acceptable similarity in content differs from discipline to discipline and in content usage. For example, you will find more similar wording in math problems than in a psychology case study. Likewise, you’ll find a lot of instances of similarity within common term definitions that are generally considered acceptable and not plagiarized—simply because there are only so many ways in which an author can provide a definition of a generally understood term! Configuration files allow the users to adjust what percentage of similarity in the content is considered fair use and what percentage is considered illegal.

    What Does the Tool Report?

    Lumina’s plagiarism engine provides customized reports to suit the needs of each client. In addition to the number of identical words, the tool can report content with minor changes (if the tense of the language has been changed) or related words (if the words are slightly modified to write differently, but have the same meaning). Below is an example of a standard form of report from an assessment question that was run through the plagiarism checker.

    A table shows columns for URL of publication, Page title, Section name, % matched, and # of identical words. Percentage matches range from a low of 33% to a high of 100%. Content submitted: In a high-energy physics experiment, a subnuclear particle moves in a circular arc of 3.47.

    Upon delivery of the report, experienced editorial personnel, whether at the educational content provider or at Lumina, review the report and assess whether the content is acceptable to use as is (with appropriate attribution), with modifications, or violates copyright law and must thus be rewritten.


    In short, the risk of plagiarism—both of publishing plagiarized content, and of having your own published content reused without permission—has always been an issue and likely will continue to be one for the foreseeable future, due to the increased presence of digitally published content. Today’s content owners are fighting an uphill battle when it comes to protecting their copyrighted content.

    But by the same token, there has also been a rise in new technologies such as those provided by Lumina, and as a result, an increase in the instances of plagiarism being flagged. The use of these innovative technologies can help to put content owners on more even ground, allowing them to not only prevent the potential for publishing unoriginal content (even if inadvertently) and to protect themselves from the theft of their own original content.

    Interested in learning more about how Lumina can help support your team’s efforts against plagiarism? Would you like to talk to the author of this post? We want to hear from you! Email the Lumina team or learn more about Lumina Datamatics.

    close slider

      MEET US AT SSP 2024

      Please prove you are human by selecting the cup.

      To contact us directly or to inquire about our services, please email