- Airflow Scheduler with AWS Redshift and Spectrum
- Provisioning with Kubernetes
- Python, SQL, Parquet files
- I joined Memrise as the first dedicated data engineer to support forming a data team. Like most data teams in 2018, the first focus was GDPR as it was coming into force within four months. My approach to this was to educate the company on what this meant for how they handled and used data. Once I had the stakeholders onboard, I began to reshape how data was consumed, stored and used within the company.
- The data team grew quickly which resulted in a ratio of one data engineer to seven data scientists/analysts. With this ratio, I made it a priority to make it easy and safe for the team to commit code to the Airflow repository. This was accomplished by enforcing using pull requests with integrated code linting which required a review. To help reduce the number of bugs introduced I deployed a staging server so features could be tested on an environment similar to production before going live.
Once the development process was working I made it easy to set up the Airflow application locally and added a framework to create automated tests end to end tests with the CI server.
- Memrise were building a CRM (Customer Relationship management) team and as part of this I was brought in to help evaluate and then integrate a CRM application. See details below in the projects section.
- Luigi Scheduler
- AWS: Redshift, ECS, S3, EC2
- Python & SQL
- Improved ETL pipeline success rate from 50% to 98%.
- Lead a validation project to improve the accuracy of our finance reporting. This resulted in the marketing team now having accurate breakdowns of profitability to the Ad level.
- Implemented a unit testing framework for developing SQL in a test driven cycle. This has speed up development and reduced the risk of breaking changes.
- Ruby on Rails Backend
- React / Node Frontend
- A project manager and myself started the team to tackle issues the business were seeing with customer post purchase behaviours.
- Coached product manager and other developers in team on running Agile sprints.
- Curated internal meetup for developers to share ideas and work they wanted to share. The reason I started the meetup was to solve a problem I was seeing with solo developers feeling isolated.
Memrise were building a CRM (Customer Relationship management) team as part of this I was brought in to help evaluate and then integrate a CRM application.
After working with the CRM team to evaluate their needs I created a customer model using our existing user base, the objective of this model was firstly to surface key attributes and metrics that were needed within the CRM. Secondly, the CRM team didn't want to import the whole user base as this would include a large number of stale customers who would unlikely open emails, which would affect the email reputation building required when launching a new CRM.
Another layer I rolled into the model was to use email bounce/failure logs from our transactional system to mark users who would likely fail to receive or interaction with our emails.
Now that we had a functional model the next step was to create a system to maintain a sync history between the data pipeline and the CRM. The reason this was important is that the Braze service contract was billed based on the updates we sent to them so if we sent the same update twice we would be billed twice! The solution I created needed to fulfil the above requirements but at the same time not require a lot of maintenance or introduce a lot of new technologies as I was still a one person team. The solution that was decided upon used a technique in SQL to create a diff of the last synced table and the most recent version of the model. SQL was chosen here as the equivalent code to diff in Python would have not been as performant.
Wonderbly like the majority of tech startups drives sales through online marketing so making marketing spend performant was an early priority.
This was done by tracking each interaction a customer had with us and then with this data building attribution models to surface marketing performance.
An example of an attribution model is last click attribution where the last interaction takes all credit for a conversion.
The attribution models were live when I joined the project but there was a mixture of software bugs and behaviour not being considered by the models. Once I had resolved the software issues, the next step was to refactor and document the processes to make this part of the data pipeline more accessible for future contributors.
Now that attribution was functioning correctly I paired up with a marketing analyst to find shortcomings of the model, make improvements and finally monitor the changes to see if they were successful.
The final part of the project involved implementing a new model using the Time Decay technique, with the aim of better understanding the behaviours of repeat purchase customers.
The project was to overhaul a Rails application which had been initially built on Rails 2.x and was currently running one of the initial 3.0 versions this meant a lot of patterns and syntax didn't follow current best practices. The application had nearly no unit or end to end tests which would make an update to Rails 5 likely to introduce bugs to what was a very stable application.
My approach was to create a testing plan for the major functions of the application so that I knew what the key features were and their expected outputs. With this plan I created high-level black box specs, I used this approach so that I didn't need to make any changes to the codebase to allow for the specs to work.
Finally, once I had all of these tests I could start updating the application one major version at a time while rereading the security notifications for that version and checking if there was any vulnerable code.
After 4 major version updates and many fixes I had the application running on the latest Rails version at this point there was a round of manual testing and then finally we shipped. To this date, there have been no new bugs reported which is marvellous.
I organise a meetup for around 30 people to learn how algorithms work. Each session we select an algorithm, watch a short presentation and then start hacking on some challenges. My main aim is making the group approachable to newbies and vets alike.
King & Mcgaw is one of the largest online art resellers in Europe, with different regional brands but with mostly the same underlining content. I was brought in to work with the King & Mcgaw engineering team to advise on the best way to re-architect their web infrastructure which consisted of Perl and PHP apps that people were afraid to touch.
Along the way I paired with the whole team to get them up to speed on Ruby and working in Agile Sprints.
The regional sites logic was duplicated across all of the site which made maintenance hard. The solution was to create slim Rails apps for each of the websites and then create a SOA (Service Oriented Architecture) to allow the art catalogue to be shared by these apps. We made it super easy to spin up these sites by gemifying the model layer of the app so the consuming application developer could use an ActiveRecord like API to query the catalogue.
Due to the number of items they sold we elected to use ElasticSearch to create a faceted search index which allowed customers to search by facets like primary colours or size.
During my last year at university, I teamed up with a friend and started offering Apple server installation and support. The business grew and after I finished university and we both decided to see how far we could go with the business in 6 months.
We quickly decided to move from hardware to web development and Josh showed me a language called Ruby which was getting some popularity. Over the next 6 years, we grew the company to 5 people doing work for one person outfits wanting to make a quick MVP to companies like Tech City and Tottenham Hotspur who worked with us on longer projects.