GitHub Repo Search Done Right

Former Member · ‎11-13-2014

Reflections from the ReviewNinja team at SAP on the implementation of an open source code-review application. The team enjoys a close working relationship with GitHub, including frequently crashing their office. (Thanks for putting up with us!)

Update: Since writing this post GitHub has announced changes to their organization permissions, this change will include a unified endpoint for all user repos regardless of repo type (personal or organizational). I am excited to integrate this endpoint into ReviewNinja in the near future and encourage everyone who is interested to check it out!

Development tools built on top of GitHub are becoming more and more ubiquitous in the daily lives of developers. Travis CI, Waffle.io, and Coveralls represent but a few of the many tools we interact with everyday to help develop, build and ship our software.

A common workflow for these tools looks something like this.

An “administrator” sets up an account with the third party tool (often using GitHub’s oauth service), enables the service with his/her desired repositories, and then configures the selected repos to fit each project’s unique needs. Step one involves the familiar handshake with GitHub in which we allow (or deny) the application certain permissions. Step three is application-specific; often we interact with a settings page or include a simple dot file in the root of our repos. The middle step, for reasons mostly unknown to end users, varies widely from application to application.

Let me illustrate the reasons behind the discrepancy of this common task and how we developed a new approach for ReviewNinja – an open source code review tool for GitHub.

Those familiar with GitHub will know that GitHub has two types of repositories:

Personal repos
Organizational repos

This distinction is manifested in the API as well. Let’s say you want to retrieve all repositories of which a user has administrative permissions (a common task for most developer tools). To achieve this we must first retrieve all user repos, then all organizations to which the user belongs, and finally all repos for each organization. This necessity to traverse a user’s organizations breadth-wise, followed by a depth-wise traversal of repos (while not forgetting to account for personal repos) is the source of this inconsistency amongst third party tools.

Let me outline a few common approaches to this task utilized by third party applications.

The sync with GitHub approach

This is the approach used by Travis. User and organization repos are occasionally synced with GitHub and (presumably) stored in a local persistence, thus avoiding frequent recursive calls that it takes to populate this data.

Once implemented, this is a solid approach. However the set-up (local persistence, syncing mechanism) may prove to be more cumbersome than beneficial. That being said, this is a good solution if you need to ensure you account for all repos in GitHub and you want the nice touch of client-side filtering.

The search approach

This approach usually pre-populates some user and organization repos, while also allowing the user to search for a specific repo. It is often unclear where the client-side filtering ends and the API search starts. It is also unclear if search queries should be of the form username/repo or simply repo, leading to some confusion. But possibly the most frustrating thing about this approach is the inability to find that repo whose name you just can’t quite remember.

The ReviewNinja approach

When we developed ReviewNinja we quickly realized the inadequacy of both of these approaches, especially from a UX perspective, so we set out to create the best experience possible. Although our solution is definitely not the most ideal (I address this below), we do believe it offers a nice alternative to the solutions outlined above.

ReviewNinja offers a simple textbox with a prompt to enter either username/ or organization/ to find a repository.

With this approach all repos are accounted for, by narrowing your search we can find any repo regardless of the size of your organization. We achieve this with no local persistence of our own and no syncing required. Furthermore that repo whose name you just can’t quite remember can be organically found by searching for the owner under which it resides.

The buy-in for this approach is the requirement that the user provides the repo owner; we believe this is a reasonable trade-off for the benefits this method provides.

You can check it out at http://review.ninja, or on GitHub at https://github.com/reviewninja/review.ninja.

The ideal solution

A single, unified API endpoint for all repos regardless of owner (personal or organizational) of which a user has specified access permissions would be the building block required for a more ideal solution. This would allow application developers to create nice touches such as autocomplete and client-side filtering, while freeing them up from having to worry about the org/repo traversal that is currently required.

GitHub Repo Search Done Right

Now live: 2014 SAP HANA and SAP HANA Cloud Applications Challenge voting

My Personal Ux, Fiori, Portal and Cloud Cheat Sheet

Web Dynpro ABAP Demonstration Videos