Popper¶
Popper is a container-native workflow engine for seamlessly running workflows locally and on CI services.
Getting Started¶
Popper is a container-native workflow execution engine. A container-native workflow is one where all steps contained in it are executed in containers. Before going through this guide, you need to have the Docker engine installed on your machine (see installations instructions here), as well as a Python 3.6+ installation capable of adding packages via Pip or Virtualenv.
Installation¶
We provide a pip
package for Popper. To install simply run:
pip install popper
Depending on your Python distribution or specific environment
configuration, using Pip might not be possible (e.g. you need
administrator privileges) or using pip
directly might incorrectly
install Popper. We highly recommend to install Popper in a Python
virtual environment using virtualenv. The following
installation instructions assume that virtualenv
is installed in
your environment (see here for more). Once
virtualenv
is available in your machine, we proceed to create a
folder where we will place the Popper virtual environment:
# create a folder for storing virtual environments
mkdir $HOME/virtualenvs
We then create a virtualenv
for Popper. This will depend on the
method with which virtualenv
was installed:
# 1) if virtualenv was installed via package, e.g.:
# - apt install virtualenv (debian/ubuntu)
# - yum install virtualenv (centos/redhat)
# - conda install virtualenv (conda)
# - pip install virtualenv (pip)
virtualenv $HOME/virtualenvs/popper
# OR
#
# 2) if virtualenv installed via Python 3.6+ module
python -m venv $HOME/virtualenvs/popper
NOTE: in the case ofconda
, we recommend the creation of a new environment beforevirtualenv
is installed in order to avoid issues with packages that might have been installed previously.
We then load the environment we just created above:
source $HOME/virtualenvs/popper/bin/activate
Finally, we install Popper in this environment using pip
:
pip install popper
To test all is working as it should, we can show the version we installed:
popper version
And to get a list of available commands:
popper --help
NOTE: given that we are usingvirtualenv
, once the shell session ends (when we close the terminal window or tab), the environment gets unloaded and newer sessions (new window or tab) will not have thepopper
command available in thePATH
variable. In order to have the environment loaded again we need to execute thesource
command (see above). In the case ofconda
we need to load the Conda environment (conda activate
command).
Create a Git repository¶
Create a project repository (if you are not familiar with Git, look here):
mkdir myproject
cd myproject
git init
echo '# myproject' > README.md
git add .
git commit -m 'first commit'
NOTE: if you run on MacOS, make sure themyproject/
folder is in a folder that is shared with the Docker engine. By default, Docker For Mac shares the/Users
folder, so putting themyproject/
folder in any subfolder of/Users/<USERNAME>/
should suffice. Otherwise, if you want to put it on an folder other than/Users
, you will need to modify the Docker For Mac settings so that this other folder is also shared with the underlying Linux VM.
Create a workflow¶
We create a small, pre-defined workflow by running:
popper scaffold
The above generates an example workflow that you can use as the
starting point of your project. This minimal example illustrates two
distinct ways in which a Dockerfile
image can be used in a workflow
(by pulling an image from a registry, or by referencing one stored in
a public repository). To show the content of the workflow:
cat wf.yml
For each step in the workflow, an image is created (or pulled) and a container is instantiated. For a more detailed description of how Popper processes a workflow, take a look at the “Workflow Language and Runtime” section. To learn more on how to modify this workflow in order to fit your needs, take a look at this tutorial or take a look at some examples.
Before we go ahead and test this workflow, we first commit the files to the Git repository:
git add .
git commit -m 'Adding example workflow.'
Run your workflow¶
To execute the workflow you just created:
popper run -f wf.yml
You should see the output printed to the terminal that informs of the three main tasks that Popper executes for each step in a workflow: build (or pull) a container image, instantiate a container, and execute the step by invoking the specified command within the container.
TIP: Typepopper run --help
to learn more about other options available and a high-level description of what this command does.
Since this workflow consists of two steps, there were two corresponding containers that were executed by the underlying container engine, which is Docker in this case. We can verify this by asking Docker to show the list of existing containers:
docker ps -a
You should see the two containers from the example workflow being listed.
To obtain more detailed information of what this command does, you can pass the --help
flag to it:
popper run --help
NOTE: All Popper subcommands allow you to pass--help
flag to it to get more information about what the command does.
Link to GitHub repository¶
Create a repository on Github. Once your Github repository has been created, register it as a remote repository on your local repository:
git remote add origin git@github.com:<user>/<repo>
where <user>
is your username and <repo>
is the name of the
repository you have created. Then, push your local commits:
git push -u origin master
Continuously Run Your Workflow on Travis¶
For this, we need to login to Travis CI using our Github credentials. Once this is done, we activate the project so it is continuously validated.
Generate .travis.yml
file:
popper ci travis
And commit the file:
git add .travis.yml
git commit -m 'Adds TravisCI config file'
Trigger an execution by pushing to github:
git push
Go to the TravisCI website to see your experiments being executed.
Next Steps¶
For a detailed description of how Popper processes workflows, take a look at the “Workflow Language and Runtime” section. To learn more on how to modify workflows to fit your needs, take a look at this tutorial or at some examples.
CLI feautures¶
New workflow initialization¶
Create a Git repository:
mkdir mypaper
cd mypaper
git init
echo '# mypaper' > README.md
git add .
git commit -m 'first commit'
Initialize the popper repository and add the configuration file to git:
popper init
git add .
git commit -m 'adds .popper.yml file'
Initialize a workflow
popper scaffold
Show what this did:
ls -l
Commit the “empty” pipeline:
git add .
git commit -m 'adding my first workflow'
Executing a workflow¶
To run the workflow:
popper run
or to execute all the workflows in a project:
popper run --recursive
Customizing container engine behavior¶
By default, Popper instantiates containers in the underlying engine by
using basic configuration options. When these options are not suitable
to your needs, you can modify or extend them by providing
engine-specific options. These options allow you to specify
fine-grained capabilities, bind-mounting additional folders, etc. In
order to do this, you can provide a configuration file to modify the
underlying container engine configuration used to spawn containers.
This file is a python script that defines an ENGINE
dictionary with
custom options and is passed to the popper run
command via the
--conf
flag.
For example, to make Popper spawn Docker containers in privileged mode, we can write the following options:
ENGINE = {
'privileged': True
}
Assuming the above is stored in a file called settings.py
, we pass
it to Popper by running:
popper run --conf settings.py
NOTE:
- Currently, the
--conf
option is only supported for thedocker
engine.- The
settings.py
file must contain adict
type variable with the nameENGINE
as shown above.
Environment Variables¶
Popper defines a set of environment variables (see Environment Variables section) that are available for all steps in a workflow. To see the values assigned to these variables, run the following workflow:
- uses: popperized/bin/sh@master
args: env
To define new variables, the env
keyword can be used (see
Attributes for more).
Reusing existing workflows¶
Many times, when starting an experiment, it is useful to be able to use
an existing workflow as a scaffold for the one we wish to write. The
popper-examples
repository contains a
list of example workflows and actions for the purpose of both learning
and to use them as a starting point. Another examples can be found on
Github’s official actions
organization.
Once you have found a workflow you’re interested in importing, you can
use the popper add
command to obtain a workflow. For example:
cd myproject/
mkdir myworkflow
popper add https://github.com/popperized/popper-examples/workflows/cloudlab-iperf-test
Downloading workflow data-science as data-science...
Workflow docker-data-science has been added successfully.
This will download the contents of the workflow and all its dependencies to your project tree.
Searching for actions¶
The popper CLI is capable of searching for premade actions that you can use in your workflows.
You can use the popper search
command to search for actions
based on a search keyword. For example, to search for npm based actions,
you can simply run:
$ popper search npm
Matched actions :
> popperized/npm
Additionally, when searching for an action, you may choose to include
the contents of the readme in your search by using the --include-readme
flag.
Once popper search
runs, it caches all the metadata related to the
search. So, to get the latest releases of the actions, you might want
to update the cache using the --update-cache
flag.
By default, popper searches for actions from a list present here. To help the list keep growing, you can add Github organization names or repository names(org/repo) and send a pull request to the upstream repository.
To get the details of a searched action, use the popper info
command. For example:
popper info popperized/cmake
An action for building CMake projects.
Continuously validating a workflow¶
The ci
subcommand generates configuration files for multiple CI
systems. The syntax of this command is the following:
popper ci <service-name>
Where <name>
is the name of CI system (see popper ci --help
to get
a list of supported systems). In the following, we show how to link
github with some of the supported CI systems. In order to do so, we
first need to create a repository on github and upload our commits:
# set the new remote
git remote add origin <your-github-repo-url>
# verify the remote URL
git remote -v
# push changes in your local repository up to github
git push -u origin master
TravisCI¶
For this, we need an account at Travis CI.
Assuming our Popperized repository is already on GitHub, we can enable
it on TravisCI so that it is continuously validated (see
here for a guide).
Once the project is registered on Travis, we proceed to generate a
.travis.yml
file:
cd my-popper-repo/
popper ci travis
And commit the file:
git add .travis.yml
git commit -m 'Adds TravisCI config file'
We then can trigger an execution by pushing to GitHub:
git push
After this, one go to the TravisCI website to see your pipelines being
executed. Every new change committed to a public repository will
trigger an execution of your pipelines. To avoid triggering an
execution for a commit, include a line with [skip ci]
as part of the
commit message.
NOTE: TravisCI has a limit of 2 hours, after which the test is terminated and failed.
CircleCI¶
For CircleCI, the procedure is similar to what we do for TravisCI (see above):
Sign in to CircleCI using your github account and enable your repository.
Generate config files and add them to the repo:
cd my-popper-repo/ popper ci circle git add .circleci git commit -m 'Adds CircleCI config files' git push
GitLab-CI¶
For GitLab-CI, the procedure is similar to what we do for TravisCI and CircleCI (see above), i.e. generate config files and add them to the repo:
cd my-popper-repo/
popper ci gitlab
git add .gitlab-ci.yml
git commit -m 'Adds GitLab-CI config file'
git push
If CI is enabled on your instance of GitLab, the above should trigger an execution of the pipelines in your repository.
Jenkins¶
For Jenkins, generating a Jenkinsfile
is
done in a similar way:
cd my-popper-repo/
popper ci jenkins
git add Jenkinsfile
git commit -m 'Adds Jenkinsfile'
git push
Jenkins is a self-hosted service and needs to be properly configured
in order to be able to read a github project with a Jenkinsfile
in
it. The easiest way to add a new project is to use the Blue Ocean
UI. A step-by-step guide on
how to create a new project using the Blue Ocean UI can be found
here. In
particular, the New Pipeline from a Single Repository
has to be
selected (as opposed to Auto-discover Pipelines
).
Visualizing workflows¶
While .workflow
files are relatively simple to read, it is nice to
have a way of quickly visualizing the steps contained in a workflow.
Popper provides the option of generating a graph for a workflow. To
generate a graph for a pipeline, execute the following:
popper dot
The above generates a graph in .dot
format. To visualize it, you can
install the graphviz
package and
execute:
popper dot | dot -T png -o wf.png
The above generates a wf.png
file depicting the workflow.
Alternatively you can use the http://www.webgraphviz.com/ website to
generate a graph by copy-pasting the output of the popper dot
command.
Workflow Syntax and Execution Runtime¶
This section introduces the YAML syntax used by Popper, describes the workflow execution runtime and shows how to execute workflows in alternative container engines.
NOTE: Popper also supports the now-deprecated HCL syntax that was introduced in the alpha version of Github Action Workflows. We strongly recommend the use of Popper’s own YAML syntax.
Syntax¶
A Popper workflow file looks like the following:
version: '1'
steps:
- uses: docker://alpine:3.9
args: ["ls", "-la"]
A workflow specification contains one or more steps in the form of a
YAML list named steps
. Each item in the list is a dictionary
containing at least a uses
attribute, which determines the docker
image being used for that step.
Workflow steps¶
The following table describes the attributes that can be used for a
step. All attributes are optional with the exception of the uses
attribute.
Attribute | Description |
---|---|
uses |
The Docker image that will be executed for that step. For example,uses: docker://node:10 . See "Referencing images in a step" section below for more examples. |
runs |
Specifies the command to run in the docker image. If runs is omitted, thecommand specified in the Dockerfile 's ENTRYPOINT instruction will execute.Use the runs attribute when the Dockerfile does not specify an ENTRYPOINT or you want to override the ENTRYPOINT command. The runs attribute does notinvoke a shell by default. Using runs: "echo $VAR" will not print the valuestored in $VAR , but will instead print \"\$VAR.\" . To use environmentvariables with the runs instruction, you must include a shell to expandthe variables, for example: runs: ["sh", "-c", "echo $VAR"] . If the value of runs refers to a local script, the path is relative to the workspace folder (see The workspace section below) |
args |
The arguments to pass to the command. This can be a string or array. If you provide args in a string, the string is split around whitespace. For example,args: "--flag --arg value" or args: ["--flag", "--arg", "value"] . If the value of args refers to a local script, the path is relative to the workspace folder (see The workspace section below). |
env |
The environment variables to set inside the container's runtime environment. If you need to pass environment variables into a step, make sure it runs a command shell to perform variable substitution. For example, if your runs attribute isset to ["sh", "-c"] , the value of args will be passed to sh -c andexecuted in a command shell. Alternatively, if your Dockerfile uses anENTRYPOINT to run the same command ("sh -c" ), args will execute in acommand shell as well. See ENTRYPOINT for more details. |
secrets |
Specifies the names of the secret variables to set in the runtime environment which the container can access as an environment variable. For example, secrets: ["SECRET1", "SECRET2"] . |
id |
Assigns an identifier to the step. By default, steps are asigned a numerid id corresponding to the order of the step in the list, with 1 identifyingthe first step. |
needs |
Identifies steps that must complete successfully before this step will be invoked. It can be a string or an array of strings. |
Referencing images in a step¶
A step in a workflow can reference a container image defined in a
Dockerfile
that is part of the same repository where the workflow
file resides. In addition, it can also reference a Dockerfile
contained in public Git repository. A third option is to directly
reference an image published a in a container registry such as
DockerHub. Here are some examples of how you can refer to an
image on a public Git repository or Docker container registry:
Template | Description |
---|---|
./path/to/dir |
The path to the directory that contains the Dockerfile . This is a relativepath with respect to the workspace directory (see The workspace section below). Example: ./path/to/myimg/ . |
{url}/{user}/{repo}@{ref} |
A specific branch, ref, or SHA in a public Git repository. If url is ommited, github.com is used by default.Example: https://bitbucket.com/popperized/ansible@master . |
{url}/{user}/{repo}/{path}@{ref} |
A subdirectory in a public Git repository at a specific branch, ref, or SHA. Example: git@gitlab.com:popperized/geni/build-context@v2.0 . |
docker://{image}:{tag} |
A Docker image published on Docker Hub. Example: docker://alpine:3.8 . |
docker://{host}/{image}:{tag} |
A Docker image in a public registry other than DockerHub. Note that the container engine needs to have properly configured to access the referenced registry in order to download from it. Example: docker://gcr.io/cloud-builders/gradle . |
It’s strongly recommended to include the version of the image you are using by specifying a SHA or Docker tag. If you don’t specify a version and the image owner publishes an update, it may break your workflows or have unexpected behavior.
In general, any Docker image can be used in a Popper workflow, but keep in mind the following:
- When the
runs
attribute for a step is used, theENTRYPOINT
of the image is overridden. - The
WORKDIR
is overridden and/workspace
is used instead (see The workspace section below). - The
ARG
instruction is not supported, thus building an image from aDockerfile
(public or local) only uses its default value. - While it is possible to run containers that specify
USER
other than root, doing so might cause unexpected behavior.
Referencing private Github repositories¶
You can reference Dockerfiles located in private Github
repositories by defining a GITHUB_API_TOKEN
environment variable
that the popper run
command reads and uses to clone private
repositories. The repository referenced in the uses
attribute is
assumed to be private and, to access it, an API token from Github is
needed (see instructions here).
The token needs to have permissions to read the private repository in
question. To run a workflow that references private repositories:
export GITHUB_API_TOKEN=access_token_here
popper run -f wf.yml
If the access token doesn’t have permissions to access private
repositories, the popper run
command will fail.
Execution Runtime¶
This section describes the runtime environment where a workflow executes.
The workspace¶
When a step is executed, a folder in your machine is bind-mounted
(shared) to the /workspace
folder inside the associated container.
By default, the folder being bind-mounted is $PWD
, that is, the
working directory from where popper run
is being invoked from. If
the -w
(or --workspace
) flag is given, then the value for this
flag is used instead.
For example, let’s look at a workflow that writes to a myfile
in the
workspace:
version: '1'
steps:
- uses: docker://alpine:3.9
args: [touch, ./myfile]
Assuming the above is stored in a wf.yml
file, the following writes
to the current working directory:
cd /tmp
popper run -f /path/to/wf.yml
In the above, /tmp/myfile
is created. If we provide a value for
-w
, the workspace path changes and thus the file is written to that
location:
cd /tmp
popper run -f /path/to/wf.yml -w /path/to
The above writes the /path/to/myfile
. And, for completeness, the
above is equivalent to:
cd /path/to
popper run -f wf.yml
Filesystem namespaces and persistence¶
As mentioned previously, for every step Popper bind-mounts (shares) a
folder from the host (the workspace) into the /workspace
folder in
the container. Anything written to this folder persists. Conversely,
anything that is NOT written in this folder will not persist after the
workflow finishes, and the associated containers get destroyed.
Environment variables¶
A step can define, read, and modify environment variables. A step
defines environment variables using the env
attribute. For example,
you could set the variables FIRST
, MIDDLE
, and LAST
using this:
version: '1'
steps:
- uses: "docker://alpine:3.9"
args: ["sh", "-c", "echo my name is: $FIRST $MIDDLE $LAST"]
env:
FIRST: "Jane"
MIDDLE: "Charlotte"
LAST: "Doe"
When the above step executes, Popper makes these variables available to the container and thus the above prints to the terminal:
my name is: Jane Charlotte Doe
Note that these variables are only visible to the step defining them and any modifications made by the code executed within the step are not persisted between steps (i.e. other steps do not see these modifications).
Exit codes and statuses¶
Exit codes are used to communicate about a step’s status. Popper uses
the exit code to set the workflow execution status, which can be
success
, neutral
, or failure
:
Exit code | Status | Description |
---|---|---|
0 |
success |
The step completed successfully and other tasks that depends on it can begin. |
78 |
neutral |
The configuration error exit status (EX_CONFIG ) indicates that the stepterminated but did not fail. For example, a filter step can use a neutral statusto stop a workflow if certain conditions aren't met. When a step returns this exit status, Popper terminates all concurrently running steps and prevents any future steps from starting. The associated check run shows a neutral status, and the overall check suite will have a status of success as long as there were no failed or cancelled steps. |
All other | failure |
Any other exit code indicates the step failed. When a step fails, all concurrent steps are cancelled and future steps are skipped. The check run and check suite both get a failure status. |
Container Engines¶
By default, steps in Popper workflows run in Docker. In addition,
Popper can execute workflows in other runtimes by interacting with
their corresponding container engines. An --engine <engine>
flag for
the popper run
is used to invoke alternative engines, where
<engine>
is one of the supported engines. When no value for this
flag is given, Popper executes workflows in Docker. Below we briefly
describe each container engine supported (besides Docker), and lastly
describe how to customize their configuration.
Supported engines¶
Singularity¶
Popper can execute a workflow in systems where Singularity 3.2+ is available. To execute a workflow in Singularity containers:
popper run --engine singularity
Limitations¶
- The use of
ARG
inDockerfile
s is not supported by Singularity. - The
--reuse
flag of thepopper run
command is not supported.
Vagrant¶
While technically not a container engine, executing workflows inside a VM allows users to run workflows in machines where a container engine is not available. In this scenario, Popper uses Vagrant to spawn a VM provisioned with Docker. It then executes workflows by communicating with the Docker daemon that runs inside the VM. To execute a workflow in Vagrant:
popper run --engine vagrant
Limitations¶
Only one workflow can be executed at the time in Vagrant runtime, since popper assumes that there is only one VM running at any given point in time.
Host¶
There are situations where a container runtime is not available and
cannot be installed. In these cases, a step can be executed directly
on the host, that is, on the same environment where the popper
command is running. This is done by making use of the special sh
value for the uses
attribute. This value instructs Popper to execute
the command or script given in the runs
attribute. For example:
version: '1'
steps:
- uses: "sh"
runs: ["ls", "-la"]
- uses: "sh"
runs: "./path/to/my/script.sh"
args: ["some", "args", "to", "the", "script"]
In the first step above, the ls -la
command is executed on the
workspace folder (see “The workspace” section). The
second one shows how to execute a script. Note that the command or
script specified in the runs
attribute are NOT executed in a shell.
If you need a shell, you have to explicitly invoke one, for example:
version: '1'
steps:
- uses: sh
runs: [bash, -c, 'sleep 10 && true && exit 0']
The obvious downside of running a step on the host is that, depending on the command being executed, the workflow might not be portable.
Custom engine configuration¶
Other than bind-mounting the /workspace
folder, Popper runs
containers with any default configuration provided by the underlying
engine. However, a --conf
flag is provided by the popper run
command to specify custom options for the underlying engine in
question (see [here][engconf] for more).
Guides¶
This is a list of guides related to several aspects of working with Github Action (GHA) workflows.
Creating a new action¶
You can create actions in a repository you own by adding a
Dockerfile
. To share GitHub Actions with the GitHub community, your
repository must be public. All actions require a Dockerfile
. An
action may also include an entrypoint.sh
file, to execute arguments,
and any other files that contain the action’s code. For example, an
action called action-a
might have this directory structure:
|-- hello-world (repository)
| |__ main.workflow
| |__ action-a
| │__ Dockerfile
| │__ README.md
| |__ entrypoint.sh
|
To use an action in your repository, refer to the action in your
.workflow
using a path relative to the repository directory. For
example, if your repository had the directory structure above, you
would use this relative path to use action-a
in a workflow for the
hello-world
repository:
action "action a" {
uses = "./action-a/"
}
Every action should have a README.md
file in the action’s
subdirectory that includes this information:
- A detailed description of what the action does.
- [Environment variables][env-vars] the action uses.
- [Secrets][secrets] the action uses. Production secrets should not be stored in the API during the limited public beta period.
- Required arguments.
- Optional arguments.
See Creating a Docker container to learn more about creating a
custom Docker container and how you can use entrypoint.sh
.
Choosing a location for your action¶
If you are developing an action for other people to use, GitHub recommends keeping the action in its own repository instead of bundling it with other application code. This allows you to version, track, and release the action just like any other software. Storing an action in its own repository makes it easier for the GitHub community to discover the action, narrows the scope of the code base for developers fixing issues and extending the action, and decouples the action’s versioning from the versioning of other application code.
Using shell scripts to create actions¶
Shell scripts are a great way to write the code in GitHub Actions. If you can write an action in under 100 lines of code and it doesn’t require complex or multi-line command arguments, a shell script is a great tool for the job. When writing actions using a shell script, following these guidelines:
- Use a POSIX-standard shell when possible. Use the
#!/bin/sh
shebang to use the system’s default shell. By default, Ubuntu and Debian use the dash shell, and Alpine uses the ash shell. Using the default shell requires you to avoid using bash or shell-specific features in your script. - Use
set -eu
in your shell script to avoid continuing when errors or undefined variables are present.
Hello world action example¶
You can create a new action by adding a Dockerfile
to the directory
in your repository that contains your action code. This example
creates a simple action that writes arguments to standard output
(stdout
). An action declared in a main.workflow
would pass the
arguments that this action writes to stdout
. To learn more about the
instructions used in the Dockerfile
, check out the official Docker
documentation. The two files you need to create an
action are shown below:
./action-a/Dockerfile
FROM debian:9.5-slim
ADD entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
./action-a/entrypoint.sh
#!/bin/sh -l
sh -c "echo $*"
Your code must be executable. Make sure the entrypoint.sh
file has
execute
permissions before using it in a workflow. You can modify the
permission from your terminal using this command:
chmod +x entrypoint.sh
This action echo
s the arguments you pass the action. For example, if
you were to pass the arguments "Hello World"
, you’d see this output
in the command shell:
Hello World
Creating a Docker container¶
Check out the official Docker documentation.
Implementing a workflow for an existing set of scripts¶
This guide exemplifies how to define a Github Action (GHA) workflow
for an existing set of scripts. Assume we have a project in a
myproject/
folder and a list of scripts within the
myproject/scripts/
folder, as shown below:
cd myproject/
ls -l scripts/
total 16
-rwxrwx--- 1 user staff 927B Jul 22 19:01 download-data.sh
-rwxrwx--- 1 user staff 827B Jul 22 19:01 get_mean_by_group.py
-rwxrwx--- 1 user staff 415B Jul 22 19:01 validate_output.py
A straight-forward workflow for wrapping the above is the following:
workflow "co2 emissions" {
resolves = "validate results"
}
action "download data" {
uses = "popperized/bin/sh@master"
args = ["scripts/download-data.sh"]
}
action "run analysis" {
needs = "download data"
uses = "popperized/bin/sh@master"
args = ["workflows/minimal-python/scripts/get_mean_by_group.py", "5"]
}
action "validate results" {
needs = "run analysis"
uses = "popperized/bin/sh@master"
args = [
"workflows/minimal-python/scripts/validate_output.py",
"workflows/minimal-python/data/global_per_capita_mean.csv"
]
}
The above runs every script within a Docker container, whose image is
the one associated to the popperized/bin/sh
action (see corresponding
Github repository here). As you would expect, this
workflow fails to run since the popperized/bin/sh
image is a
lightweight one (contains only Bash utilities), and the dependencies
that the scripts need are not be available in this image. In cases
like this, we need to either use an existing action that has
all the dependencies we need, or create an action ourselves.
In this particular example, these scripts depend on CURL and Python. Thankfully, actions for these already exist, so we can make use of them as follows:
workflow "co2 emissions" {
resolves = "validate results"
}
action "download data" {
uses = "popperized/bin/curl@master"
args = [
"--create-dirs",
"-Lo workflows/minimal-python/data/global.csv",
"https://github.com/datasets/co2-fossil-global/raw/master/global.csv"
]
}
action "run analysis" {
needs = "download data"
uses = "jefftriplett/python-actions@master"
args = [
"workflows/minimal-python/scripts/get_mean_by_group.py",
"workflows/minimal-python/data/global.csv",
"5"
]
}
action "validate results" {
needs = "run analysis"
uses = "jefftriplett/python-actions@master"
args = [
"workflows/minimal-python/scripts/validate_output.py",
"workflows/minimal-python/data/global_per_capita_mean.csv"
]
}
The above workflow runs correctly anywhere where Github Actions workflow can run.
NOTE: Thedownload-data.sh
contained just one line invoking CURL, so we make that call directly in the action block and remove the bash script.
When no container runtime is available¶
In scenarios where a container runtime is not available, the special
sh
value for the uses
attribute of action blocks can be used. This
value instructs Popper to execute actions directly on the host machine
(as opposed to executing in a container runtime). The example workflow
above would be rewritten as:
workflow "co2 emissions" {
resolves = "validate results"
}
action "download data" {
uses = "sh"
args = [
"curl", "--create-dirs",
"-Lo workflows/minimal-python/data/global.csv",
"https://github.com/datasets/co2-fossil-global/raw/master/global.csv"
]
}
action "run analysis" {
needs = "download data"
uses = "sh"
args = [
"workflows/minimal-python/scripts/get_mean_by_group.py",
"workflows/minimal-python/data/global.csv",
"5"
]
}
action "validate results" {
needs = "run analysis"
uses = "sh"
args = [
"workflows/minimal-python/scripts/validate_output.py",
"workflows/minimal-python/data/global_per_capita_mean.csv"
]
}
The obvious downside of running actions directly on the host is that
dependencies assumed by the scripts might not be available in other
environments where the workflow is being re-executed. Since there are
no container images associated to actions that use sh
, this will
likely break the portability of the workflow. In this particular
example, if the workflow above runs on a machine without CURL or on
Python 2.7, it will fail.
NOTE: Theuses = "sh"
special value is not supported by the Github Actions platform. This workflow will fail to run on GitHub’s infrastructure and can only be executed using Popper.
Other Resources¶
- Official Github Actions documentation.
- A list of example workflows can be found at
https://github.com/popperized/popper-examples. Other examples
can be found on Github’s official
actions
organization. - Awesome-actions list.
- Self-paced hands-on tutorial.
FAQ¶
How can I create a virtual environment to install Popper¶
The following creates a virtual environment in a $HOME/venvs/popper
folder:
# create virtualenv
virtualenv $HOME/venvs/popper
# activate it
source $HOME/venvs/popper/bin/activate
# install Popper in it
pip install popper
The first step is is only done once. After closing your shell, or
opening another tab of your terminal emulator, you’ll have to reload
the environment (activate it
line above). For more on virtual
environments, see
here.
How can we deal with large datasets? For example I have to work on large data of hundreds GB, how would this be integrated into Popper?¶
For datasets that are large enough that they cannot be managed by Git, solutions such as a PFS, GitLFS, Datapackages, ckan, among others exist. These tools and services allow users to manage large datasets and version-control them. From the point of view of Popper, this is just another tool that will get invoked as part of the execution of a pipeline. As part of our documentation, we have examples on how to use datapackages, and another on how to use data.world.
How can Popper capture more complex workflows? For example, automatically restarting failed tasks?¶
A Popper pipeline is a simple sequence of “containerized bash scripts”. Popper is not a replacement for scientific workflow engines, instead, its goal is to capture the highest-most workflow: the human interaction with a terminal.
Can I follow Popper in computational science research, as opposed to computer science?¶
Yes, the goal for Popper is to make it a domain-agnostic experimentation protocol. See the https://github.com/popperized/popper-examples repository for examples.
How to apply the Popper protocol for applications that take large quantities of computer time?¶
The popper run
takes an optional action
argument that can be used
to execute a workflow up to a certain step. Run popper run --help
for more.
Contributing¶
Code of Conduct¶
Anyone is welcome to contribute to Popper! To get started, take a look at our contributing guidelines, then dive in with our list of good first issues and open projects.
Popper adheres to the code of conduct posted in this repository. By participating or contributing to Popper, you’re expected to uphold this code. If you encounter unacceptable behavior, please immediately email us.
Install from source¶
To install Popper in “development mode”, we suggest the following approach:
cd $HOME/
# create virtualenv
python -m virtualenv $HOME/virtualenvs/popper
# load virtualenv
source $HOME/virtualenvs/popper/bin/activate
# clone popper
git clone git@github.com:systemslab/popper
cd popper
# install popper from source
pip install -e cli
The -e
flag passed to pip
tells it to install the package from the
source folder, and if you modify the logic in the popper source code
you will see the effects when you invoke the popper
command. So with
the above approach you have both (1) popper installed in your machine
and (2) an environment where you can modify popper and test the
results of such modifications.
NOTE: The virtual environment created above needs to be reloaded every time you open a new terminal window (source
commmand), otherwise thepopper
command will not be found by your shell.
Contributing CLI features¶
To contribute new CLI features:
- Add a new issue describing the feature.
- Fork the official repo and implement the issue on a new branch.
- Add tests for the new feature. We test the
popper
CLI command using Popper itself. The Popper pipeline for testing thepopper
command is available here. - Open a pull request against the
master
branch.
Contributing example pipelines¶
We invite anyone to implement and document Github Action workflows. To add an example, you can fork an open a PR on the https://github.com/popperized/popper-examples repository.