The number mill

Thoughts, summaries around data science, software stuff, and other random bits and pieces of knowledge I come across

Processing UK 2021 census open data with python tools

Introduction The census provides a rich set of demographic information that could be useful for various data science tasks, such as geomarketing, house price analysis etc. With the results of the 2021 census being published recently, this notebook demonstrates how to manipulate this data with FOSS python tools to build a dataset that can be used for downstream tasks. For this demo, I will be focusing on output areas (OA) of London:...

February 1, 2023 · 9 min · Wendy Mak

Geo raster data parsing with rasterio

There are two main ways geospatial data are stored rasters, where each ‘pixel’ stores data values. This corresponds to files such as geotiffs. vectors, where information is stored more like a table form, and each row will have a ‘geometry’ field which stores information that allows you to recreate a geographical feature, such as a point , a polygon etc. It might also store the coordinate reference system so you can correctly place the shapes on a map This set of notes corresponds to handling raster data....

January 14, 2023 · 4 min · Wendy Mak

How to check if your deep learning library is actually using the GPU

If you’ve got a GPU on your system that you want to run your deep learning model on, you’d probably want to check that the library is able to access the GPU. Installation issues/ incorrect setups etc can mean that it’s actually inaccessible. I have googled far too many times ‘is tensorflow/pytorch accessing the GPU’, so putting this down here so I don’t have to go through the same stackoverflow posts again and again 😅...

November 13, 2021 · 2 min · Wendy Mak

How to count occurences in Bigquery Array/Repeated Fields

In BigQuery, there is the concept of repeated fields and arrays. I was trying to figure out how to count how many entries in a table contain a certain value in that repeated column. And since the syntax [value] IN some_array is not valid, there is one extra that is needed. Suppose we have some data around freemium podcast platform. In this platform users can pay for an upgraded service, or listen for free....

November 11, 2021 · 1 min · Wendy Mak

Blog Setup With Hugo and Github Pages

So, I’ve decided to revamp my blog, and rather than going with medium, thought that it’s a bit nicer to use my own site as the content remains under my control and I can use markdown. Github has a good set of documentation on setting up a github page, but as I am using Hugo as the static site generator, there are a few extra things I need to add. These are the steps I used to get it all up and running....

November 5, 2021 · 4 min · Wendy Mak