Evilham

Evilham.com

Phabricator: batch audit

Introduction

At work I use phabricator to collaborate with my employees, usually I know what they are working on and have a rough idea that things are going fine, so we currently don’t rely on pre-commit code reviews, which are a wonderful tool, but instead rely on semi-regular audits. (Check this for an overview on phabricator reviews and audits)

This means, that every once in a while, I have to go through my audit queue and check that indeed things are going mostly alright. I raise concerns about particular changes seldom, because of a fluid communication, so I mostly just read and accept changes to keep having a reasonably up-to-date overview.

The issue is: phabricator‘s UI is very slow for this workflow.

The good news is that it has a very good API that allows automation of many things.

This is about how I implemented phabaudit.sh, a batch audit workflow for phabricator that works-for-me(tm).

Table of Contents

Conduit: phabricator‘s API

Even before swagger was a thing, phabricator had a built-in page documenting the conduit API and allowing users to explore the API in a somewhat interactive fashion.

This is found at https://phabricator_instance/conduit/.

Guessing the API endpoint

Since I know audits are associated with commits, I searched for commit-related API endpoints. Indeed, diffusion.commit.search is a good place to start and it contains a simple example, which satisfies my current needs:

{
  ...
  "queryKey": "active",
  ...
}

Where "queryKey": "active" means that I want to list active audits only. There is also a mention of the order parameter which can be oldest to list entries in chronological order.

Let’s test that.

Choosing the interaction tool

The examples section in the conduit page offers curl and arc call-conduit examples, both have pros and cons.

curl is a mostly standard tool nearly guaranteed to be available, which means that it can be mostly tested directly from the shell with:

$ curl https://phabricator_instance/api/diffusion.commit.search \
    -d api.token=api-token \
    -d order=oldest \
    -d queryKey=active > audits.json

But there is another option: arc call-conduit, that is phabricator‘s very well thought out Command Line Interface.

It enables many workflows and takes care of managing the API tokens (which I don’t have to pass int he command line every time). Audits though are a workflow that is currently not supported, but call-conduit enables arbitrary API calls.

I happen to use arc, amongst others, to contribute to FreeBSD, which does use pre-commit code reviews, that enables me and the project members and community to review changes before they land, raise concerns, suggest improvements, ensure nothing fishy is trying to be committed, …

Since I already have arc, I will use that to forget about authentication.

$ echo '{"queryKey": "active", "order": "oldest"}' | \
    arc call-conduit --conduit-uri https://phabricator_instace/ \
        diffusion.commit.search > audits.json

Parsing the output

The output of both tools is JSON piped to an audits.json file which can be “easily” manipulated with jq. I write “easily” because it takes some getting used to, but using jq is certainly doable.

This gives me the information I need to an audit_commits.json file:

$ jq -c '.response.data[] | {"commit": .fields.identifier, "repository": .fields.repositoryPHID}' audits.json > audit_commits.json

Getting the diff (introduced changes)

In order to get the diff, so I can audit it locally, we need a couple more API calls that can also be guessed from the https://phabricator_instance/conduit/ page:

$ echo '{"commit": COMMIT, "repository": REPOSITORY}' | \
    arc call-conduit --conduit-uri https://phabricator_instance/ \
        diffusion.rawdiffquery

{
  "error": null,
  "errorMessage": null,
  "response": {
    "tooSlow": false,
    "tooHuge": false,
    "filePHID": "PHID-FILE-00000000"
  }
}

And with that filePHID, we can get the actual diff data:

$ echo '{"phid": "PHID-FILE-00000000"}' | \
    arc call-conduit --conduit-uri https://phabricator_instance/ \
        file.download

{
  "error": null,
  "errorMessage": null,
  "response": BASE_64_STUFF
}

Which is base64 encoded, so we need to decode it. By checking man 1 b64decode I realise I need the -r flag, so getting the diff file to test.diff in the hard drive looks like this:

$ echo '{"phid": "PHID-FILE-00000000"}' | \
    arc call-conduit --conduit-uri https://phabricator_instance/ \
        file.download | \
    jq -r .response | \
    b64decode -r > test.diff

Approving the changes

After opening test.diff in my favourite editor, I decide that the change is valid and I want to mark it as approved.

Again, since audits are associated with commits, this will be in diffusion.commit.edit:

$ echo '{"objectIdentifier": COMMIT_PHID,
         "transactions": [{"type": "accept", "value": true}]}' | \
    arc call-conduit --conduit-uri https://phabricator_instance/ \
        diffusion.commit.edit

Putting it all together

Now that I have all the pieces, I can finish automating it with some POSIX shell as follows (download the full script here):

#!/bin/sh
#
# Find the latest version at:
#   https://evilham.com/en/blog/2020-phabricator/batch-audit/phabaudit.sh
# With the accompanying post:
#   https://evilham.com/en/blog/2020-phabricator/batch-audit/
# 
# Copyright 2020 Evilham <code@evilham.com>
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
#    this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
#    this list of conditions and the following disclaimer in the documentation
#    and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

#
# Process environment variables.
#
# User may provide CONDUIT_ARGS or PHABRICATOR_INSTANCE
# If PHABRICATOR_INSTANCE is passed, CONDUIT_ARGS will be adapted so that
# arc call-conduit always uses: --conduit-uri ${PHABRICATOR_INSTANCE}
#
# If CONDUIT_ARGS is passed, it will be used verbatim (modulo appending
# --conduit-uri as explained before).
if [ -n "${PHABRICATOR_INSTANCE}" ]; then
    CONDUIT_ARGS="${CONDUIT_ARGS} --conduit-uri '${PHABRICATOR_INSTANCE}'"
fi

# Temp dir for the audit diffs
TMP_DIR=$(mktemp -d -t audits)

get_audits()
{
    echo '{"queryKey": "active", "order": "oldest"}' | \
        arc call-conduit ${CONDUIT_ARGS} diffusion.commit.search
}

filter_commit_data()
{
    jq -c '.response.data[] | 
        {
            "phid": .phid,
            "commit": .fields.identifier,
            "repository": .fields.repositoryPHID
        }'
}

get_diff_phid()
{
    jq -c '{"commit": .commit, "repository": .repository}' | \
        arc call-conduit ${CONDUIT_ARGS} diffusion.rawdiffquery | \
        jq -r .response.filePHID
}

save_diff()
{
    diff_phid="${1}"
    # Download the diff to disk
    echo "{\"phid\": \"${diff_phid}\"}" | \
        arc call-conduit ${CONDUIT_ARGS} \
            file.download | \
        jq -r .response | \
        b64decode -r > "${TMP_DIR}/${diff_phid}.diff"
    # Return path to diff in stdout
    echo "${TMP_DIR}/${diff_phid}.diff"
}

get_phid_url()
{
    jq -c '{"names": [.phid]}' | \
        arc call-conduit ${CONDUIT_ARGS} \
            phid.lookup | \
        jq -r '.response | to_entries | .[] | .value.uri'
}

accept_changes()
{
    jq -c '{
        "objectIdentifier": .phid,
        "transactions": [{"type": "accept", "value": true}]
    }' | \
        arc call-conduit ${CONDUIT_ARGS} \
            diffusion.commit.edit
}

for audit in $(get_audits | filter_commit_data); do
    audit_url="$(echo "${audit}" | get_phid_url)"
    printf "\n\nAudit URL: ${audit_url}\n"
    diff_phid="$(echo "${audit}" | get_diff_phid)"
    diff_file="$(save_diff "${diff_phid}")"
    # Open in editor for easy review
    ${EDITOR} "${diff_file}"
    # Ask for feedback as necessary
    echo -n "Do you accept this change? [Y/n] "
    read accept
    if [ -z "${accept}" ] || [ "${accept}" != "${accept#[Yy]}" ]; then
        # Close audit 
        echo "${audit}" | accept_changes
    fi
done

Conclusion

Now that I’m done designing a workflow that is reasonable for me, I better start doing some auditing /o\ and real-world testing of phabaudit.sh.

If this happens to be useful to you, let me know!