Data Science: A multi-disciplinary field in which scientific methodology is applied to data analysis in order to find useful insights and promote evidence-based decision making.

thewiringundertheboard Blog

Visualising the Tonal Characteristics of Sonic Youth

10th December 2018

Born out of New York’s late 1970’s avant-garde “No Wave” scene, Sonic Youth were an enigma to me as a teenager. I could play songs on the guitar that I liked by various bands, all except theirs. The problem being that they used a number of alternate guitar tunings which, in the pre-web days, were a complete mystery to me. Alternate tunings that meant it was very difficult, if not impossible to play their songs using the standard guitar tuning of E Standard.

In this blog post, I will take tuning data from the official Sonic Youth website and apply data science techniques in order to visualise and explore these alternate tunings, to look at their development with a view to understanding how they contributed towards their overall tonal characteristics.

[Read more]

Probability - The Birthday Problem, and a Couple of Extras

22nd October 2018

The birthday problem is a famous problem in probability, and there is already a lot of information about it around so this will not be the most original blog post, but the solution contains a couple of interesting probability details, so as well as explaining how to solve the problem itself I’ll elaborate on these details.

[Read more]

Sampling Methods for Data Science

17th September 2018

We are in the era of big data, where we can collect, store, and process huge amounts like never before. With so much data at our fingertips, the question arises, do we still need to use sampling? It’s a fair question, but considering that we would not always have complete data for an entire population, in which case big data would itself be a sample. Sampling is as relevant now as it ever has been.

In this blog post I’ll detail some of the most common sampling methods along with implementations in both R and Python. I have also provided two Jupyter Notebooks to download for all the code examples below, one Jupyter notebook for R and one Jupyter notebook for Python.

[Read more]

Machine Learning Metrics - Precision, Recall, and Beyond

15th July 2018

When I was first taught machine learning, I was given some calculations for metrics. Accuracy was kind of obvious, but others such and Precision and Recall, Sensitivity and Specificity less so. These names didn’t seem to have much meaning to me and I was frequently mixing them up at the beginning because I didn’t have any real context in order to associate the names with their function. In this blog post I intend to present them with some real-world context, in order to give them more meaning with a view to making them easily understandable.

[Read more]

Visualising Brighton GP patient numbers

10th June 2018

With increasing frequency in the local Brighton news, I read about GP surgeries closing. This became very apparent to me in January 2017 when I was reading about another closure to suddenly realise that it was my own doctor. On top of that I was unable to register at the nearest surgery to my home, only three-hundred meters away because they were not accepting any new NHS patients. With the number of surgeries closing, a total of nine since 2015, I was wondering whether any new surgeries have opened to counter this, at which surgeries the displaced patients have registered and whether it has caused any overload. In this blog post I will visualise some geospatial data and explore these questions.

[Read more]

Brighton data science meetups

27th April 2018

Over the last few weeks I have been to two data science meetups in the Brighton area, BrightonPy and the Sussex Data Science meetup. Here’s a short rundown of both of them.

[Read more]

Web Scraping with Scrapy

2nd April 2018

At some time in a data scientist’s work there will be a requirement to scrape some data from a website. For example, it came up right away in my first ever data science project. Back then I used the Python library Beautiful Soup but at present my tool of choice is Scrapy, an open-source web scraping framework. In this post I will discuss the installation and coding of a Scrapy web spider, then demonstrate it on an example website.

[Read more]

Soft Skills in Data Science

17th March 2018

As I mentioned in my introductory post, I have noticed a large focus on technical skills in data science articles, with the greatest emphasis being on programming. I imagine this is because programming is clear-cut and it can be easier to teach and write about, but this could also give the wrong impression that data science is all about programming. In this post I would like to give credit to soft skills which I feel do not receive enough attention.

[Read more]

First blog post - what is this all about?

3rd February 2018

It is going to be predominately about data science and the skills, techniques and tools required to practise it. Data science is an emerging multi-disciplinary field that provides a wide subject area with a wealth of topics to discuss.

Strangely enough, it has only been recently that I have started to call myself a data scientist, despite having completed my first data science project in 2012. I have a background in programming and I called myself a programmer when I landed my first programming job, so why the hesitation to take on a new title?

[Read more]