← back to the blog


GitHub User Location Search

Posted on July 14th, 2018 in GrowthHack by Brandon

I travel a lot and I like networking with developers in the city I'm visiting to get a feel for the local tech scene. So many people are working on so many cool projects, it's an absolute blast learn about!

I thought I would try to scale my networking by growth hacking a process that streamlines the number of people I could connect with before I got to my destination. Eager to try out this idea and with an upcoming trip to Prague, I wrote a small script in NodeJS over a weekend and called it GitHub Geo User Search (you can find the repo here). I’m interested in how people will respond to a soft reach out.

Using the GitHub user search API (documentation found here) I can find all developers in a geographical location, and the GitHub user API (documentation found here) provides a user’s public details (email, website and a short bio).

In testing the GitHub API, I quickly ran into a few imposed API caps. GitHub limited the number of calls per minute to 30 and limited the accessible search results to the first 1000 returned records. I also found that the Order and Sort properties on the user search API did not work as specified in the documentation.

With these discovered limitations, I needed to find a way to extract the largest possible dataset for any given region I was travelling to (Prague being the first time I would try this). First, I tried one letter user searches on GitHub login names by location - (https://api.github.com/search/users?q=a+type:user+in:login+location:prague) this yielded no results. I then tried two letter user search on GitHub login names by location (https://api.github.com/search/users?q=ab+type:user+in:login+location:prague) which produced a dataset. Analyzing the dataset quickly shows that GitHub will return all users whose login starts with ‘AB’.

The result was perfect - I was now able to construct an array of allowable characters in a GitHub username (I made an educated guess on allowable characters as a quick google search, I literally only looked at the first page, produced no obvious answer and I didn’t really care if I missed a non-alphanumeric character). I then looped through that array creating a search array of every possible two-character combination.

I was now able to recursively call the GitHub search API with my search array. Each returned dataset was significantly filtered to produce a result set less than the accessible first 1000 limitation. I also need to limit my API calls to 30 per minute (or one every 2 seconds) to not trigger GitHub API call limits.

After the search calls completed, I had a complete dataset of all developers who have set their location set to Prague. Now all I needed to do was call the GitHub User API to further retrieve the publicly available information. Voila, I now have a large enough dataset to make a soft introduction to.

I will write a further update on this process in a week or so to provide an update on responses, which I hope to be positive!

 

© 2018 Brandon Caruana. All rights reserved