SBN

Git on Steroids

There is a universal law that anyone in the tech world should know: If
you ask a programmer to do something, he/she will do it their way.

Even though creativity, abstract thinking, and putting your signature in
your source code is a
fundamental part of
programming, sometimes it also becomes a barrier when it comes to
maintainability, readability and usability due to lack of code
standardization.

Many efforts to help programmers to implement best practices in their
code have been made, being Python’s
PEP 8 and
Ruby’s Rubocop Style
Guide

some of the most popular examples. Linting
tools
are also
of the essence for this task, as they provide a dynamic way for
programmers to check if their code is following all the established
conventions.

So far, it seems like the standardization problems can be easily solved
and never again a bunch of crazy programmers will make a mess out of
something.

"Rambo"

Figure 1. It’s all good

But wait a minute…​ Is there any other activity involving a creative
process that programmers frequently do? Oh yes, we have
Git commit messages.

Most Git repositories have messy logs

Thanks to the Continuous Hacking
service we offer at Fluid Attacks, we get to see Git repositories
from different organizations in several countries. A vast majority of
these repositories have messy logs.

Such problem can be easily explained: Teams do not stick to a specific
standard when it comes to commit messages and general Git logging.
Programmers just fill out their commit messages the way they personally
think it is best.

You may say:

Ok, logs may be a little messy, but should I really care that much
about them?

The short answer is: Yes, you definitely should.

For the long answer, I am first going to number the cons of having messy
Git logs.

Cons

  1. Logs fail to accomplish their purpose:

    Usually, due to vague or incomplete commit messages, a commit
    message log can only be understood by the one who wrote it (just
    like it happens with source code), thus making the log useless.

  2. Reverting becomes a headache:

    If there is not a clear history of your repo, reverting to a
    previous version given the need will always be painful, as you will
    have to decipher such history in order to know to what point you
    want to go back.

  3. Knowing who did what becomes harder:

    Good commit messages express, from a high level perspective, what
    changes were made. Not-so-good commit messages usually have two or
    three words and do not actually tell the reader what was changed,
    forcing him/her to look at the source code.

Lastly, let me bring it down to a specific example of how a Git
repository history looks when programmers do whatever they want:

"Bad Log"

Figure 2. Bad Git Log

Notice how these commit messages do not communicate anything regarding
the changes that were made, they’re basically trash logs.

Pros

Other than solving all the cons mentioned above, there are two extra
pros:

  1. Data Analytics for the repo becomes possible:

    When you have one hundred developers working simultaneously on a
    repository, you definitely want to know things like:

    • What percentage of time do they spend fixing bugs?

    • What percentage of the programmers focus on creating new
      features?

    • What are the main causes of reverts in the repo?

    And an endless number of questions whose answers might be incredibly
    useful for both business insights and development
    performance/quality improving within the company. By having a
    defined commit message syntax, such answers can be found by using
    Data
    Analytics

    techniques

  2. Logs (partially) become your documentation:

    By standardizing commit messages, you will no longer have to create
    huge documentation manuals explaining all the details of how a
    product internally works. Each file within the repo will have its
    related history with detailed information about how it currently
    works and worked in the past. Some documentation will probably still
    be needed (user documentation, for example), but we have witnessed
    how the amount of documentation required dramatically diminishes
    thanks to good logs.

After setting a standard syntax for commit messages, your repo history
will start to look like this:

"Good Log"

Figure 3. Good Git Log

I want you to notice the three most important details to begin with:

  1. Titles include keywords like feat, front, fix, doc…​ These
    tags will be essential for the Data Analytics efforts, as they will
    allow algorithms to know what
    the purpose of the commit was.

  2. Body section is usually itemized, with every item describing
    relevant changes made in the commit.

  3. An issue (#N.N) is always referenced with the purpose of
    understanding what the programmer was trying to do with the commit.
    This is useful for version control services with issue trackers like
    Gitlab, Github, etc.

Implementing a commit message Syntax

If by this point you are convinced about the importance of having a
standard commit message syntax, and are interested in implementing it,
make sure to keep reading, as I will show the syntax we use, the tools
that support it, and the benefits we get from it.

In Fluid Attacks we use the Angular JS Commit Message
Conventions

syntax with a few modifications inspired by our specific needs.

Syntax

The syntax is as follows:

  [type]([scope]): #[issue-number]{.issue-part} [title] // This is the commit title                 // This blank line separates the commit title from the commit body  [body]         // This is the commit body. It CAN have multiple lines

Where:

  • [variable] are required variables that must be replaced in
    a final commit message ([] symbols must be removed).

  • {variable} are optional variables that must be replaced or
    removed in a final commit message ({} symbols must be removed).

  • // Comment are comments that must be removed in a final commit
    message.

In the following sections I will explain in detail what this syntax can
help us achieve.

Types

Types offer a high level perspective of the commit purpose. They are
explicitly defined in a closed list that covers most (if not all)
possible scenarios.

In the syntax, the [type] variable has to be one of the following:

  rever  // Revert to a previous commit in history  feat   // New feature  perf   // Improves performance  fix    // Bug fix  refac  // Neither fixes a bug or adds a feature  test   // Adding missing tests or correcting existing tests  style  // Do not affect the meaning of the code (formatting, etc)

Notice how types are short words that can be easily read by anyone, but
also represent keywords that can be processed by a machine.

Scopes

Scopes provide specific detail about what part of the repository/system
was modified in the commit. Just like types, they are explicitly defined
in a closed list.

In the syntax, the [scope] variable has to be one of the
following:

    front  // Front-End change    back   // Back-End change    infra  // Infrastructure change    conf   // Configuration files change    build  // Build system, CI, compilers, etc (scons, webpack...)    job    // asynchronous or schedule tasks (backups, maintenance...)    cross  // Mix of two or more scopes    doc    // Documentation only changes

Notice how scopes, just like types, are also human-readable and
machine-readable.

Other important rules

Articles like this one and
learning on the fly helped us to define other rules for improving
general commit message quality:

  1. A Commit title must exist.

  2. A Commit title must not contain the ‘:‘ character aside
    from the one specified in the syntax.

  3. A Commit title must have 50 characters or less.

  4. A Commit title must be lower case.

  5. A Commit title must not finish with a dot ‘.‘.

  6. A Commit title must reference an issue.

  7. A Commit title must be meaningful. Avoid using things like
    feat(build): #5.1 feature.

  8. A blank line between commit title and commit body must exist.

  9. A commit body must exist.

  10. Lines in commit body must be 72 characters or less.

  11. Try to itemize your commit body.

  12. Do not use the word ‘part‘ for splitting commits for a
    single issue. Use #[issue-number]{.issue-part} instead as
    specified in the syntax.

Explaining combinations

Below is a table explaining all the possible combinations between types
and scopes for a commit message (Types are columns, scopes are rows):

reverfeatperffixrefacteststyle
frontRevert front-end to a previous versionAdd new feature to front-endImprove perf in front-endFix something in front-endChange something in front-endAdd tests for front-endChange front-end code style
backRevert back-end to a previous versionAdd new feature to back-endImprove perf in back-endFix something in back-endChange something in back-endAdd tests for back-endChange back-end code style
infraRevert infra to a previous versionAdd new feature to infraImprove perf in infraFix something in infraChange something in infraAdd tests for infraChange infra code style
confRevert config files to a previous versionAdd new feature to config filesNAFix something in config filesChange something in config filesNAChange config files code style
buildRevert building tools to a previous versionAdd new feature to building tools or add a new building toolImprove building perfFix something in building toolsChange something in building toolsAdd tests for building toolsChange building tools code style
jobRevert jobs to a previous versionAdd new feature to jobs or add a new jobImprove jobs perfFix something in jobsChange something in jobsAdd tests for jobsChange jobs code style
crossRevert several scopes to a previous versionAdd new feature for several scopesImprove perf in several system partsFix something in several system partsChange something in several system partsAdd tests for several system partsChange code style in several system parts
docRevert doc to a previous versionAdd new docNAFix something in docChange something in docNAChange doc style

Where:

  • perf is performance.

  • infra is infrastructure.

  • config is configuration.

  • doc is documentation.

  • NA is not applicable.

Differences with pure AngularJS syntax

In this section I will talk about the changes we made to the original
AngularJS syntax and the reasons behind them.

Types

  1. Instead of creating a particular syntax only for reverts as
    specified in the AngularJS’
    document
    ,
    for the sake of simplicity, we decided to make a rever type that
    follows the same syntax as everything else.

  2. The docs type was renamed to doc and turned into a scope.
    The reason of such change was to make doc commits more
    informative by allowing programmers to specify the purpose of the
    documentation change. For example:

    • feat(doc): documenting new feature.

    • fix(doc): fixing documentation.

    • style(doc): changing its style.

  3. We added a perf type for performance changes with the purpose of
    identifying what commits have a performance improvement as main
    objective and reducing the number of commits that would fall under
    the refac type.

  4. chore type was removed as any maintenance commit can be
    translated to a perf/refac/fix commit.

  5. In order to have shorter commit titles, we shortened types like
    docs to doc, revert to rever, refactor to
    refac.

Scopes

When it comes to scopes, the difference consists in us having a closed
list of keywords while AngularJS allowing programmers to specify any
scope they want.

According to the AngularJS document, “Scope can be anything specifying
place of the commit change. For example $location, $browser,
$compile, $rootScope, ngHref, ngClick, ngView, etc…​”.

We, on the other hand, consider that this information should go in the
commit [title].

By moving the “place of the commit change” to the [title], we get
to define some generic scopes that allow us to make commit messages more
informative.

Scopes like front, back, build, etc, although not as precise
as ngClick, are machine-readable and still provide information about
where the change was made. This, combined with allowing the programmer
to be more specific in the [title], is why we decided to create a
closed list for scopes.

Other differences

The last big difference between the AngularJS syntax and ours is the
#[issue-number]{.issue-part} part, whose purpose is to force
commits to always reference an issue in order to be able to track what
motivated such commit.

Other minor differences, like making a maximum of 50 characters for the
commit title mandatory, are either based on personal opinions of what we
think makes a commit message look better, or preferring rules taken from
other places over the AngularJS ones.

Make syntax usage a reality

You may be thinking:

Ok, we just defined a huge commit message syntax with a ton of rules.
But, how are we actually going to make programmers follow it in a
pragmatic way?

Enter the savior:
Commitlint.

Commitlint is an incredible tool that can check all the syntax we just
defined. Not only it runs all the checks instantly, but it works as a
Git hook,
which means that it runs all the checks right after a programmer runs a
git commit command on his/her local machine. It is even capable of
failing the commit attempt if the commit message the programmer just
provided happens to be non-syntax-compliant.

Proof of concept

We won’t be setting up Commitlint for this specific syntax as it would
make the article too technical. Nevertheless, I will show you how it
currently works in one of our repos:

Bad commit message

"Failed commit"

Figure 4. Failed commit

The commit we entered was:

"Bad commit message"

Figure 5. Bad Commit message

The Commitlint output was:

"Bad commit message"

Figure 6. Failed Commitlint output

Notice how Commitlint dynamically tells the programmer what specific
rules his/her commit message is not following, making the fix process a
lot easier. The commit attempt was also stopped by Commitlint, as it
doesn’t allow commits to pass unless their message is syntax compliant.

Good commit message

"Passed commit"

Figure 7. Passed commit

The commit we entered was:

"Good commit message"

Figure 8. Good Commit message

The Commitlint output was:

"Passed commit message"

Figure 9. Passed Commitlint output

Notice how Commitlint tells the programmer that all checks passed and
proceeds to accept the commit.

A little bit of Data Analytics

Now that we have defined a commit message syntax, I would like to show
you a few very simple Chartio charts we are
now able to generate from our
Integrates repository:

Pie chart of monthly commit types for February, 2019

"Monthly commit types"

Figure 10. Monthly commit types for February, 2019

This chart allows us to know the percentage of commits for every type in
a month.

One possible interpretation can be:

During February, 2019, out of 132 commits, 39.4% of them (52) had
developing a new feature as their purpose.

Pie chart of monthly commit scopes for February, 2019

"Monthly commit scopes"

Figure 11. Monthly commit scopes

This chart allows us to know the percentage of commits for every scope
in a month.

One possible interpretation can be:

During February, 2019, out of 132 commits, 33.3% of them (44) were
focused on the back-end of the site.

Heatmap chart of monthly commit types vs scopes for February, 2019

"Monthly commit heatmap"

Figure 12. Monthly commit heatmap

This chart allows us to know:

  • What type/scope combinations were mostly used.

  • What type/scope combinations were never used.

  • In general, we can get insights about what programmers did and where
    they did it.

One possible interpretation can be:

During February, 2019, out of 132 commits, 20 of them were feat(back),
leading us to think that programmers spent a considerable amount of
their time programming new functionalities in the back-end of the site.

It does not end here

Now that we have a commit message syntax and our Git logs are growing
everyday, it is up to us to think what we want to do with the normalized
data we’re getting.

Many different charts and analysis can be done, it is just a matter of
asking ourselves:

  1. What do we want to know?

  2. Can we know it with the data we currently have?

if so, lets make it happen!

Conclusion

In this article we’ve covered from the importance of Git logs to what
benefits we can obtain by standardizing our commit message syntax and
implementing tools for checks like Commitlint, being Data Analytics
the most important one.

I invite you to check the commit history of our public repository:

Want to get more technical?

Make sure to visit our Commit Message Documentation
Page
.
There you will be able to find more detailed information regarding our
commit message syntax and other interesting topics like expanding syntax
checks to Gitlab Merge Requests.

That was it! Have a good one!


*** This is a Security Bloggers Network syndicated blog from Fluid Attacks RSS Feed authored by Daniel Salazar. Read the original post at: https://fluidattacks.com/blog/git-steroids/