I'm picking apart my client's codebase for refactoring. As expected, there's a lot of duplicates.
But duplicates are easy. fdupes lists them quick. The real, actual problem is near-duplicates! The files that differ by couple of lines, but otherwise are identical. How do I find them? Is there any tool for that?
@drq czkawka works great for images and some other files, not sure about text tho