17th September 2018
We are in the era of big data, where we can collect, store, and process huge amounts like never before. With so much data at our fingertips, the question arises, do we still need to use sampling? It’s a fair question, but considering that we would not always have complete data for an entire population, in which case big data would itself be a sample. Sampling is as relevant now as it ever has been.
In this blog post I’ll detail some of the most common sampling methods along with implementations in both R and Python. I have also provided two Jupyter Notebooks to download for all the code examples below, one Jupyter notebook for R and one Jupyter notebook for Python.[Read more]
15th July 2018
When I was first taught machine learning, I was given some calculations for metrics. Accuracy was kind of obvious, but others such and Precision and Recall, Sensitivity and Specificity less so. These names didn’t seem to have much meaning to me and I was frequently mixing them up at the beginning because I didn’t have any real context in order to associate the names with their function. In this blog post I intend to present them with some real-world context, in order to give them more meaning with a view to making them easily understandable.[Read more]
10th June 2018
With increasing frequency in the local Brighton news, I read about GP surgeries closing. This became very apparent to me in January 2017 when I was reading about another closure to suddenly realise that it was my own doctor. On top of that I was unable to register at the nearest surgery to my home, only three-hundred meters away because they were not accepting any new NHS patients. With the number of surgeries closing, a total of nine since 2015, I was wondering whether any new surgeries have opened to counter this, at which surgeries the displaced patients have registered and whether it has caused any overload. In this blog post I will visualise some geospatial data and explore these questions.[Read more]
27th April 2018Read more]
2nd April 2018
At some time in a data scientist’s work there will be a requirement to scrape some data from a website. For example, it came up right away in my first ever data science project. Back then I used the Python library Beautiful Soup but at present my tool of choice is Scrapy, an open-source web scraping framework. In this post I will discuss the installation and coding of a Scrapy web spider, then demonstrate it on an example website.[Read more]
17th March 2018
As I mentioned in my introductory post, I have noticed a large focus on technical skills in data science articles, with the greatest emphasis being on programming. I imagine this is because programming is clear-cut and it can be easier to teach and write about, but this could also give the wrong impression that data science is all about programming. In this post I would like to give credit to soft skills which I feel do not receive enough attention.[Read more]
3rd February 2018
It is going to be predominately about data science and the skills, techniques and tools required to practise it. Data science is an emerging multi-disciplinary field that provides a wide subject area with a wealth of topics to discuss.
Strangely enough, it has only been recently that I have started to call myself a data scientist, despite having completed my first data science project in 2012. I have a background in programming and I called myself a programmer when I landed my first programming job, so why the hesitation to take on a new title?[Read more]