Setting up Python Projects: Part V | by Johannes Schmidt


Mastering the Art of Python Project Setup: A Step-by-Step Guide

Photo by Zoya Loonohod on Unsplash

Whether you’re a seasoned developer or just getting started with 🐍 Python, it’s important to know how to build robust and maintainable projects. This tutorial will guide you through the process of setting up a Python project using some of the most popular and effective tools in the industry. You will learn how to use GitHub and GitHub Actions for version control and continuous integration, as well as other tools for testing, documentation, packaging and distribution. The tutorial is inspired by resources such as Hypermodern Python and Best Practices for a new Python project. However, this is not the only way to do things and you might have different preferences or opinions. The tutorial is intended to be beginner-friendly but also cover some advanced topics. In each section, you will automate some tasks and add badges to your project to show your progress and achievements.

The repository for this series can be found at github.com/johschmidt42/python-project-johannes

This part was inspired by this blog post:

Semantic release with Python, Poetry & GitHub Actions 🚀
I’m planning to add a few features to Dr. Sven thanks to some interest from my colleagues. Before doing so, I needed to…

  • OS: Linux, Unix, macOS, Windows (WSL2 with e.g. Ubuntu 20.04 LTS)
  • Tools: python3.10, bash, git, tree
  • Version Control System (VCS) Host: GitHub
  • Continuous Integration (CI) Tool: GitHub Actions

It is expected that you are familiar with the versioning control system (VCS) git. If not, here’s a refresher for you: Introduction to Git

Commits will be based on best practices for git commits & Conventional commits. There is the conventional commit plugin for PyCharm or a VSCode Extension that help you to write commits in this format.

Overview

Structure

  • Git Branching Strategy (GitHub flow)
  • What is a release? (zip, tar.gz)
  • Semantic Versioning (v0.1.0)
  • Create a release manually (git tag, GitHub)
  • Create a release automatically (conventional commits, semantic releases)
  • CI/CD (release.yml)
  • Create a Personal Access Token (PAT)
  • GitHub Actions Flow (Orchestrating workflows)
  • Badge (Release)
  • Bonus (Enforce conventional commits)

Releasing software is an important step in the software development process as it makes new features and bugfixes available to users. One key aspect of releasing software is versioning, which helps to track and communicate the changes made in each release. Semantic versioning is a widely used standard for versioning software, which uses a version number in the format of Major.Minor.Patch (e.g. 1.2.3) to indicate the level of changes made in a release.

Conventional commits is a specification for adding human and machine readable meaning to commit messages. It’s a way to format commit messages in a consistent manner, which make it easy to determine the type of change made. Conventional commits are commonly used in conjunction with semantic versioning, as the commit messages can be used to automatically determine the version number of a release. Together, semantic versioning and conventional commits provide a clear and consistent way to track and communicate the changes made in each release of a software project.

There are many different branching strategies out there for git. Many people gravitate towards GitFlow (or variants), Three Flow, or Trunk based Flows. Some do strategies in between these, such as this one. I’m using the very simple GitHub flow branching strategy, where all bug fixes and features have their own separate branch, and when complete, each branch is merged to main and deployed. Simple, nice and easy.

GitHub Flow branching strategy

Whatever your strategy might be, in the end you merge a pull request and (probably) create a release.

In short, a release is packing up code of a version (e.g. zip) and pushing it to production (whatever this might be for you).

Release management can be messy. Therefore there needs to be a concise way that you follow (and others), that defines what a release means and what changes between one release and the next. If you don’t track the changes between the releases, then you probably won’t understand what has been changed in each release and you can’t identify any problems that might have been introduced with new code. Without a changelog, it can be difficult to understand how the software has evolved over time. It can also make it difficult to roll back changes if necessary.

Semantic Versioning is just a number schema and standard practice in the industry for software development. It indicates the level of changes between this version and the previous one. There are three parts to a semantic version number, such as 1.8.42, that follow the pattern of :

Each one of them means a different degree of change. A PATCH release indicates bug fixes or trivial changes (e.g. from 1.0.0 to 1.0.1). A MINOR release indicates adding/removing functionality or backwards compatible changes of functionality (e.g. from 1.0.0 to 1.1.0). A MAJOR release indicates adding/removing functionality and potentially backwards in-compatible changes such as breaking changes (e.g. from 1.0.0 to 2.0.0).

I recommend a talk of Mike Miles, if you want a visual introduction into releases with semantic versioning. It’s a summary of what releases are and how semantic versioning with git tags allows us to create releases.

About git tags: There are lightweight and annotated tags in git. A lightweight tag is just a pointer to a specific commit whereas an annotated tag is a full object in git.

Let’s create a release manually first and then automate it.

If you remember, our example_app’s __init__.py file contains the version

# src/example_app/__init__.py

__version__ = "0.1.0"

as well as the pyproject.toml file

# pyproject.toml

[tool.poetry]
name = "example_app"
version = "0.1.0"
...

So the first thing we must do is to create an annotated git tag v0.1.0 and add it to the latest commit in main:

> git tag -a v0.1.0 -m "version v0.1.0"

Please note that if no commit hash is specified at the end of the command, then git will use the current commit you are on.

We can get a list of tags with:

> git tag

v0.1.0

and if we want delete it again:

> git tag -d v0.1.0

Deleted tag 'v0.1.0'

and get more information about the tag with:

> git show v0.1.0

tag v0.1.0

Tagger: Johannes Schmidt <johannes.schmidt.vik@gmail.com>
Date: Sat Jan 7 12:55:15 2023 +0100
version v0.1.0
commit efc9a445cd42ce2f7ddfbe75ffaed1a5bc8e0f11 (HEAD -> main, tag: v0.1.0, origin/main, origin/HEAD)
Author: Johannes Schmidt <74831750+johschmidt42@users.noreply.github.com>
Date: Mon Jan 2 11:20:25 2023 +0100
...

We can push the newly created tag to origin with

> git push origin v0.1.0

Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 171 bytes | 171.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:johschmidt42/python-project-johannes.git
* [new tag] v0.1.0 -> v0.1.0

so that this git tag is now available on GitHub:

Let’s manually create a new release in GitHub with this git tag:

We click on Create a new release , select our existing tag (that is already bound to a commit) and then generate release notes automatically by clicking on the Generate release notes button before we finally publish the release with the Publish release button.

GitHub will automatically create a tar and a zip (assets) for the source code, but will not build the application! The result will look like this:

To summarise, the steps for a release are:

  • create a new branch from your default branch (e.g. feature or fix branch)
  • make changes and increase the version (e.g. pyproject.toml and __init__.py)
  • commit the feature/bug fix to the default branch (probably through a Pull Request)
  • add an annotated git tag (semantic version) to the commit
  • publish the release on GitHub with some additional information

As programmers, we don’t like to repeat ourselves. So there are plenty of tools that make these steps super easy for us. Here, I will introduce Semantic Releases, a tool specifically for Python Projects.

It’s a tool which automatically sets a version number in your repo, tags the code with the version number and creates a release! And this is all done using the contents of Conventional Commit style messages.

Conventional Commits

What is the connection between semantic versioning and conventional-commits?

Certain commit types can be used to automatically determine a semantic version bump!

  • A fix commit is a PATCH.
  • A feat commit is a MINOR.
  • A commit with BREAKING CHANGE or ! is a MAJOR.

Other types, e.g. build, chore, ci, docs, style, refactor, perf, test generally don’t increase the version.

Check out the bonus section at the end to find out how to enforce conventional commits in your project!

Automatic semantic releases (locally)

We can add the library with:

> poetry add --group semver python-semantic-release

Let’s go through the configuration settings that allow us to automatically generate change-logs and releases. In the pyproject.toml, we can add semantic_release as a tool:

# pyproject.toml

...
[tool.semantic_release]
branch = "main"
version_variable = "src/example_app/__init__.py:__version__"
version_toml = "pyproject.toml:tool.poetry.version"
version_source = "tag"
commit_version_number = true # required for version_source = "tag"
tag_commit = true
upload_to_pypi = false
upload_to_release = false
hvcs = "github" # gitlab is also supported

  • branch: specifies the branch that the release should be based on, in this case the “main” branch.
  • version_variable: specifies the file path and variable name of the version number in the source code. In this case, the version number is stored in the __version__ variable in the file src/example_app/__init__.py.
  • version_toml: specifies the file path and variable name of the version number in the pyproject.toml file. In this case, the version number is stored in the tool.poetry.version variable of the pyproject.toml file
  • version_source: Specifies the source of the version number. In this case, the version number is obtained from the tag (instead of commit)
  • commit_version_number: This parameter is required when version_source = "tag". It specifies whether the version number should be committed to the repository or not. In this case, it is set to true, which means that version number will be committed.
  • tag_commit: Specifies whether a new tag should be created for the release commit. In this case, it is set to true, which means that a new tag will be created.
  • upload_to_pypi: Specifies whether the package should be uploaded to the PyPI package repository. In this case, it is set to false, which means that the package will not be uploaded to PyPI.
  • upload_to_release: Specifies whether the package should be uploaded to the GitHub release page. In this case, it is set to false, which means that the package will not be uploaded to GitHub releases.
  • hvcs: Specifies the hosting version control system of the project. In this case, it is set to “github”, which means that the project is hosted on GitHub. “gitlab” is also supported.

We can update the files where we have defined the version of the project/module. For this we use the variable version_variable for normal files and version_toml for .toml files. The version_source defines the source of truth for the version. Because the version in these two files is tightly coupled with the git annotated tags, for example we create a git tag with every release automatically (flag tag_commit is set to true), we can use the source tag instead of the default value commit that looks for the last version in the commit messages. To be able to update the files and commit the changes, we need to set the commit_version_number flag to true. Because we don’t want to upload anything to the Python index PyPi, the flag upload_to_pypi is set to false. And for now we don’t want to upload anything to our releases. The hvcs is set to github (default), other values can be: gitlab.

We can test this locally by running a few commands, that I will add directly to our Makefile:

# Makefile

...

##@ Releases

current-version: ## returns the current version
@semantic-release print-version --current

next-version: ## returns the next version
@semantic-release print-version --next

current-changelog: ## returns the current changelog
@semantic-release changelog --released

next-changelog: ## returns the next changelog
@semantic-release changelog --unreleased

publish-noop: ## publish command (no-operation mode)
@semantic-release publish --noop

With the command current-version we get the version from the last git tag in the git tree:

> make current-version

0.1.0

If we add a few commits in conventional commit style, e.g. feat: new cool feature or fix: nasty bug, then the command next-version will compute the version bump for that:

> make next-version

0.2.0

Right now, we don’t have a CHANGELOG file in our project, so that when we run:

> make current-changelog

the output will be empty. But based on the commits we can create the upcoming changelog with:

> make next-changelog### Feature
* Add releases ([#8](https://github.com/johschmidt42/python-project-johannes/issues/8)) ([`5343f46`](https://github.com/johschmidt42/python-project-johannes/commit/5343f46d9879cc8af273a315698dd307a4bafb4d))
* Docstrings ([#5](https://github.com/johschmidt42/python-project-johannes/issues/5)) ([`fb2fa04`](https://github.com/johschmidt42/python-project-johannes/commit/fb2fa0446d1614052c133824150354d1f05a52e9))
* Add application in app.py ([`3f07683`](https://github.com/johschmidt42/python-project-johannes/commit/3f07683e787b708c31235c9c5357fb45b4b9f02d))
### Documentation
* Add search bar & github url ([#6](https://github.com/johschmidt42/python-project-johannes/issues/6)) ([`3df7c48`](https://github.com/johschmidt42/python-project-johannes/commit/3df7c483eca91f2954e80321a7034ae3edb2074b))
* Add badge pages.yml to README.py ([`b76651c`](https://github.com/johschmidt42/python-project-johannes/commit/b76651c5ecb5ab2571bca1663ffc338febd55b25))
* Add documentation to Makefile ([#3](https://github.com/johschmidt42/python-project-johannes/issues/3)) ([`2294ee1`](https://github.com/johschmidt42/python-project-johannes/commit/2294ee105b238410bcfd7b9530e065e5e0381d7a))

If we push new commits (directly to main or through a PR) we could now publish a new release with:

> semantic-release publish

The publish command will do a sequence of things:

  1. Update or create the changelog file.
  2. Run semantic-release version.
  3. Push changes to git.
  4. Run build_command and upload the distribution file to your repository.
  5. Run semantic-release changelog and post to your vcs provider.
  6. Attach the files created by build_command to GitHub releases.

Every step can be of course configured or deactivated!

Let’s build a CI pipeline with GitHub Actions that runs the publish command of semantic-release with every commit to the main branch.

While the overall structure remains the same as in lint.yml, test.yml or pages.yml, there are a few changes that need to be mentioned. In the step Checkout repository, we add a new token that is used to checkout the branch. That is because the default value GITHUB_TOKEN does not have the required permissions to operate on protected branches. Therefore, we must use a secret (GH_TOKEN) that contains a Personal Access Token with permissions. I will show later how the Personal Access Token can be generated. We also define fetch-depth: 0 to fetch all history for all branches and tags.

with:
ref: ${{ github.head_ref }}
token: ${{ secrets.GH_TOKEN }}
fetch-depth: 0

We install only the dependencies that are required for the semantic-release tool with:

- name: Install requirements
run: poetry install --only semver

In the last step, we change some git configurations and run the publish command of semantic-release:

- name: Python Semantic Release
env:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
run: |
set -o pipefail
# Set git details
git config --global user.name "github-actions"
git config --global user.email "github-actions@github.com"
# run semantic-release
poetry run semantic-release publish -v DEBUG -D commit_author="github-actions <action@github.com>"

By changing the git config, the user that commits will be “github-actions”. We run the publish command with DEBUG logs (stdout) and set the commit_author to “github-actions” explicitly. Alternatively to this command, we could use the GitHub action from semantic-release directly, but the set up steps of running the publish command are very few and the action uses a docker container that needs to be pulled every time. Because of that I prefer to make a simple run step instead.

Because the publish command will make a commit, you might be worried that we could end up in an endless loop of workflows being triggered. But do not worry, the resulting commit will not trigger another GitHub Actions Workflow run. This is due to limitations set by GitHub.

Personal access token are an alternative to using passwords for authentication to GitHub Enterprise Server when using the GitHub API or the command line. Personal access tokens are intended to access GitHub resources on behalf of yourself. To access resources on behalf of an organization, or for long-lived integrations, you should use a GitHub App. For more information, see “About apps.”

In other words: We can create an Personal Access Token and have GitHub actions store and use that secret to perform certain operations on our behalf. Keep in mind, if the PAT is compromised, it could be used to perform malicious actions on your GitHub repositories. It is therefore recommended to use GitHub OAuth Apps & GitHub Apps in organisations. For the purposes of this tutorial, we will be using a PAT to allow the GitHub actions pipeline to operate on our behalf.

We can create a new access token by navigating to the Settings section of your GitHub user and following the instructions summarised in Creating a Personal Access Token. This will give us a window that will look like this:

Personal Access Token of an admin account with push access to the repos.

By selecting the scopes, we define what permissions the token will have. For our use case, we need push access to the repositories which why the new PAT GH_TOKEN should have the repo permissions scope. That scope would authorise pushes to protected branches, given you don’t have Include administrators set in the protected branch’s settings.

Going back to the repository overview, in the Settings menu, we can either add an environment setting or a repository setting under the Secrets section:

Repository secrets are specific to a single repository (and all environments used in there), while environment secrets are specific to an environment. The GitHub runner can be configured to run in a specific environment which allows it to access the environment’s secrets. This makes sense when thinking of different stages (e.g. DEV vs PROD) but for this tutorial I’m fine with a repository secret.

Now that we a have a few pipelines (linting, testing, releasing, documentation), we should think about the flow of actions with a commit to main! There are a few things we should be aware of, some of them specific to GitHub.

Ideally, we want that a commit to main creates a push event that trigger the Testing and the Linting workflow. If these are successful, we run the release workflow which is responsible to detect if there should be a version bump based on conventional commits. If so, the release workflow will directly push to main, bumping the versions, adding a git tag and create a release. A published release should then, for example, update the documentation by running the documentation workflow.

Expected flow of actions

Problems & considerations

  1. If you read the last paragraph carefully or looked at the FlowChart above, you might have noticed that there are two commits to main. One initial (i.e. from a PR) and a second one for the release. Because our lint.yml and test.yml react on push events on the main branch, they would run twice! We should avoid running it twice to save resources. To achieve this, we can add the [skip ci] string to our version commit message. A custom commit message can be defined in the pyproject.toml file for the tool semantic_release.
# pyproject.toml

...

[tool.semantic_release]
...
commit_message = "{version} [skip ci]" # skip triggering ci pipelines for version commits
...

2. The workflow pages.yml currently runs on a push event to main. Updating the documentation could be something that we only want to do if there is a new release (We might be referencing the version in the documentation). We can change the trigger in the pages.yml file accordingly:

# pages.yml

name: Documentation

on:
release:
types: [published]

Building the documentation will now require a published release.

3. The Release workflow should depend on the success of the Linting & Testing workflow. Currently we don’t have defined dependencies in our workflow files. We could have these workflows depend on the completion of defined workflow runs in a specific branch with the workflow_run event. However, if we specify multiple workflows for the workflow_run event:

on:
workflow_run:
workflows: [Testing, Linting]
types:
- completed
branches:
- main

only one of the workflows needs to completed! This is not what we want. We expect that all workflows must be completed (and successful). Only then the release workflow should run. This is in contrast to what we get when we define dependencies between jobs in a single workflow. Read more about this inconsistency and shortcoming here.

As an alternative, we could use a sequential execution of pipelines:

The big downside with this idea is that it a) does not allow parallel execution and b) we won’t be able to see the dependency graph in GitHub.

Solution

Currently, the only way I see to deal with the above mentioned problems is to orchestrate the workflows in an orchestrator workflow.

Let’s create this workflow file:

The orchestrator is triggered when we push to the branch main .

Only if both workflows: Testing & Linting are successful, the release workflow is called. This is defined in with the needs keyword. If we want to have more granular control over job executions (workflows), consider using the if keyword as well. But be aware of the confusing behaviour as explained in this article.

To make our workflows lint.yml , test.yml & release.yml callable by another workflow, we need to update the triggers:

# lint.yml

---
name: Linting

on:
pull_request:
branches:
- main
workflow_call:

jobs:
...

# test.yml

---
name: Testing

on:
pull_request:
branches:
- main
workflow_call:

jobs:
...

# release.yml

---
name: Release

on:
workflow_call:

jobs:
...

Now the new workflow (Release) should only run if the workflows for quality checking, in this case the linting and testing, succeed.

To create a badge, this time, I will use the platform shields.io.

It’s a website that generates badges for projects, which display information such as version, build status, and code coverage. It offers a wide range of templates and allows customization of appearance and creation of custom badges. The badges are updated automatically, providing real-time information about the project.

For a release badge, I selected GitHub release (latest SemVer) :

The badge markdown can be copied and added to the README.md:

Our landing page of the GitHub now looks like this ❤ (I’ve cleaned up a little and provided a description):



Source link

Leave a Comment