Automating Knife Classification with Machine Learning
After a few thought provoking twitter conversations, this week I launched the first prototype of our Knife Classification tool built using FastAI and Streamlit (mirrored here).
Dressed up in a nifty package, the prototype currently:
- Lets users upload pictures into a nifty, gorgeous web-app created in Streamlit, which works great on desktop or mobile
- Classifies pictures into 1 of 5 knife “categories”, powered by a FastAI deep learning neural network
- Asks for user feedback, using new pictures for model re-training
Over the next few months, me and the Police Rewired community will be working to add functionality, improve accuracy, and generally turn this into something usable — pop into our Discord if you’d like to get involved!
As it’s been something of a learning project, I thought it would be useful to explain what we built, and why — I’ll be breaking these into 3 parts, covering:
- Why and how it works (for non-technical people!) (what you’re reading now)
- Training the model with FastAI (coming soon)
- Creating a front-end with Streamlit, hosting and production (coming soon)
UK policing loves showing off knife seizures — from the very-nearly-innocent craft knife to the giant stacks of machetes and combat knives straight out of “Rambo”, if you follow UK policing twitter accounts, you’ve probably seen them all.
So what do we actually do with all those pictures? Not that much, it turns out. A few years ago, Gavin Hales started the #theresthatknifeagain hashtag, highlighting the ubiquity of a certain model of Anglo-Arms hunting knife — if he hadn’t noticed this particular trend, would anybody have picked up on it? And how many other trends have we missed along the way, that might help us identify specific knife-crime hotspots, criminogenic behaviours or problem retailers?
Policing problems are solved by intelligence: you figure out why a certain criminal opportunity has become so tempting, easy or pervasive, and which systems make it possible, and find a way of disrupting them… but pictures of knives aren’t intelligence . Unless they’re codified and classified, they’re just junk data that looks good on twitter.
Categorising knives seems easy…right? Surely we can just ask cops do to it! You’ve got kitchen knives, combat knives, zombie knives…butterfly knives, flick knives, Stanley knives, pocket knives, swiss army knives…Oh, and I guess…swords? Are swords knives? What about daggers? Is a big dagger a sword? Is a small dagger a knife? You see where I’m going. Turns out, it isn’t that easy. It’s certainly not easy to do retrospectively, at scale, through hundreds of online pictures.
Enter machine learning, which is actually really rather good at extracting identifying information from pictures and using it to make classifications.
It’s worth highlighting that I am not a computer-vision expert…frankly, I specifically describe myself as a quantitative crime-scientist rather than a data-scientist! But the barrier for entry into this field has all but disappeared in the last few years: results that would have been competition winning a few years ago are now pretty easy to achieve thanks to libraries like Fast.ai. I won’t go into the detail of how FastAI works, but suffice to say that “deep learning” neural networks like PyTorch or Tensorflow all work on similar principles, and are exceedingly good at identifying obscure patterns in data, and using that to correctly classify or make predictions — and they’re especially good at computer vision problems, quickly picking out the “features” that make your identifiable object unique.
No matter what sort of model you’re training, you’ll need a clean and well labelled “training set” to work on — we used DuckDuckGo to download around 300 images. This is certainly not clean, and very far from perfect (a lot of the images had to be manually cleaned and labelled) but it’s sufficient as a quick hack. We selected a few categories for our initial model:
- Butterfly Knives
- Folding Pocket Knives
- Combat and Bayonet Knives
- Kitchen Knives
These probably aren’t quite suitable for operational purposes, but have the benefit of being discrete, easily recognised categories that help us prove the concept. After a bit of tweaking and re-labelling, our model was around 80% accurate on a test set —not perfect, but certainly good enough.
To take machine-learning into the real world, it needs to be “productionised”: put on an website/app/portal or similar, where users can submit queries and obtain results. We used Streamlit, a really nifty web-framework for hosting data-science and machine learning applications. There are a million different options here, but Streamlit’s ability to produce rapid, pretty, easily customised web-apps is wondrous. I hosted the final project on Render, which I’ve recently fallen in love with as an intuitive, inexpensive platform for hosting web-apps (in sharp contrast to places like AWS and Azure, which seem to offer to do everything, never tell me how much I’ll be paying, and leave me confused at every turn).
The last step was adding a way for our model to improve — machine learning algorithms shouldn’t be static, but instead keep learning from their mistakes as they go. I added a feedback system, whereby every user could identify whether the classification was correct, and submit their picture to our dataset for the model to be retrained. Every week, the model will be re-trained on our newly cleaned and enhanced dataset, hopefully meaning it won’t make the same mistake twice.
With that, our prototype is live! So what’s next?
First, we’ll be adding an API, to make sure it’s as open and inter-connected as possible. To make sure the tool is operationally relevant, we’ll be tweaking our categories and codification. To retrain our model in a way that makes sense, we’re hoping to build a platform to leverage our community and crowd-source accurate labelling using a dataset of historic knife pictures scraped from twitter accounts. With that done, we should have a tool that can be easily plugged-in to existing operational workflows, and a platform to help us work on more complex ML challenges in the future. Exciting times!