Skip to main content
simil - the better 'comm' command
  1. Projects/

simil - the better 'comm' command

·2 mins

GitHub: https://github.com/Criomby/simil

Comparing text files can be a tedious process.
You want to identify lines that are similar – not just identical. While the standard comm command on Linux/Unix systems can achieve a basic line-matching comparison, it’s often too simplistic and inflexible for the nuances of code.

What is simil?
#

simil is a command-line tool built for precisely this task: comparing two code files and highlighting similarities, with a focus on accuracy and configurability. It’s designed to handle the complexities of code, allowing you to ignore whitespace, line prefixes, and other stylistic differences that would otherwise throw off a basic comparison.

How does it compare to comm?
#

The standard comm command is a powerful tool for finding common lines between files. However, it operates on a purely textual level. Here’s a breakdown of the key differences:

  • comm: Simply identifies lines that appear in both files. It’s great for basic file synchronization or finding common sections, but it’s blind to code style and formatting.
  • simil: Offers a far more intelligent approach. It’s configured to understand code semantics to a degree. You can tell it to ignore whitespace, specific prefixes, and much more, resulting in far more accurate comparisons.

Why simil is Better for Code Comparison
#

Here’s where simil truly shines:

  • Configurable Matching: This is the biggest advantage. You can customize how simil analyzes files. Want to ignore whitespace? Just tell it to. Need to ignore specific keywords or prefixes (like else or import)? You can define them in a configuration file.
  • Configuration Options: simil uses a simil.toml configuration file to define ignore patterns.
  • Whitespace Handling: --trim removes leading/trailing whitespace, crucial for accurate comparisons.
  • Ignore Patterns: You can define specific lines/patterns to ignore entirely.
  • Detailed Output: Unlike comm, simil provides valuable information about where the matches are found within the files, including line numbers. This was one major pain-point for me with the comm command when working with large code files.
  • Flexibility: It can compare any type of text file, not just code.