top of page

VroniPlag, Giffey and Numbers that Mislead

A Column by Michael Seadle

VroniPlag has posted an analysis of the dissertation of Franziska Giffey (geb. Süllke) with the claim that they have documented plagiarism on 76 of the 205 pages with content or 37.1%.

"Bisher (8. Mai 2019, 09:20:57 (UTC+2)) wurden auf 76 von 205 Seiten Plagiatsfundstellen dokumentiert. Dies entspricht einem Anteil von 37.1% aller Seiten. Davon enthalten 11 Seiten 50%-75% Plagiatstext und 1 Seiten mehr als 75% Plagiatstext." (VroniPlag, 2019)

"Up to now (8 May 2019 at 09:20:57 (UTC+2)) 76 instances of plagiarism out of 205 pages have been documented. This means 37.1% of all pages. Of those, 11 pages have 50%-75% plagiarism and 1 page more than 75%." [my translation]

The figures on VroniPlag are misleading, because they give the impression that 37.1% of the whole content had plagiarism, rather than that problems (according to their definition) occurred on 37.1% of individual pages, regardless of whether just a few lines were involved. In fact the overall percentage is significantly lower by VroniPlag's own standards, if one uses the percentages linked to their own colour-coding:

  • Black is up to 50%

  • Dark Red is 50% to 75%

  • Red is 75% to 100%

If one multiplies the number of pages in each of the colours times the maximum percentage, the results are as follows:

  • 64 pages are coloured black: 64 * 50% (the maximal value for black) = 32 pages worth of possible plagiarism.

  • 11 pages are coloured dark red: 11 * 75% (the maximal value for dark red) = 8.3 pages worth of possible plagiarism.

  • 1 page is coloured red: 1 * 100% (the maximal value for red) = 1 page worth of possible plagiarism.

  • The total for all three colours using the maximum percentages is: 41.3 pages or 20.1% of the 205 pages with content.

Since the percentages associated with the colours are maximum values, the midpoint may give a more accurate picture:

  • 64 pages are coloured black: 64 * 25% (the midpoint between 0 and 50%) = 16 pages worth of possible plagiarism.

  • 11 pages are coloured dark red: 11 * 62.5% (the midpoint between 50% and 75%) = 6.9 pages worth of possible plagiarism.

  • 1 page is coloured red: 1 * 87.5% (the midpoint between 75% and 100%) = .9 page worth of possible plagiarism

  • The total for all three colours using the midpoint percentages is: 23.8 pages or 11.6% of the 205 pages with content.

 

The VroniPlag figures need to be understood in context. VroniPlag's mission is to find plagiarism and they put the worst possible interpretation on their results. There is in fact a big difference between 37.1% and 11.6%. One could argue that 11.6% is still too much -- if all of the marked passages were genuine plagiarism -- but the numbers need to be presented in a more balanced and less misleading way, which VroniPlag fails to do. Systems like iThenticate give a percentage of words, not of pages with any potential plagiarism. Interestingly enough VroniPlag itself admits deep in a section on "Statistics" (p. 18 in a PDF version) that they estimate the amount of plagiarism to be about 6%. The basis of the estimate is not explained. [Note: I am thankful to Sven Schroder for bringing this relatively hidden number to my attention.] (VroniPlag Statistics, 2019)

VroniPlag publishes their criteria for plagiarism, which is commendable, but their criteria reflect rigid rules that are not universal in academic practice. The rules also take no account of legitimate choices about which source to cite or about the context within a work. In a literature review, for example, it is almost impossible not to reuse words from the articles being discussed.

A set of standards that measures the number of overlapping words in a particular spatial context (that is, the number of words in a sentence or paragraph that overlap with another text) gives a more nuanced and more accurate view. An example can be found in my book "Quantifying Research Integrity" (Seadle, 2016). The results of this kind of "greyscale analysis" are not designed for capturing headlines, but for judging fairly.

REFERENCES

Seadle, Michael, Dec 2016. Quantifying Reseaarch Integrity. Morgan-Claypool. Available online.

VroniPlag Wiki, 2019. Eine kritische Auseinandersetzung mit der Dissertation von Dr. Franziska Giffey (geb. Süllke): Europas Weg zum Bürger - Die Politik der Europäischen Kommission zur Beteiligung der Zivilgesellschaft. Available online.

VroniPlag Wiki Statisics, 2019. Eine kritische Auseinandersetzung mit der Dissertation von Dr. Franziska Giffey (geb. Süllke): Europas Weg zum Bürger - Die Politik der Europäischen Kommission zur Beteiligung der Zivilgesellschaft. p. 18. Available online.

bottom of page