Lessons from Scraping a DarkNet Market

What I Learnt Studying a Dying Market

Andreas Varotsis
Towards Data Science

--

A Note on Ethics

While writing this, I’ve tried to adhere to a few principles to try and balance the benefits and risks of sharing some of this information — I’ve listed those below:

  • This article will not teach you how to purchase drugs. I’ve purposefully not described in detail any steps to access any DNMs. Illegal drugs are dangerous, bad, and you won’t learn how to get them here.
  • I have anonymised the data to remove usernames that could be linked to other platforms or people — seller usernames have been replaced with unique numbers. Other than that, all data and code I have used is accessible and linked at the bottom of this post
  • Any data I have used was accessible to anybody accessing Pax Romana — I haven’t used any covert methodology or fancy code to access anything

I appreciate some people may see this differently — I feel these discussions are best held in the open, and obscurity isn’t a particularly good means of keeping people safe.

On the 20th of April, I decided to try and flex my rusty data-science tool-set by scraping and analysing the contents of a dark-net market (DNM). I was hoping to analyse the impact of fluctuating drug prices as social-distancing measures were relaxed (for instance, the much touted link to violence) but, as will soon become clear, circumstances conspired to ruin my cunning plan.

Screenshot of Pax Romana from darknetstats.com

This isn’t a journal article, I’m not a statistician/data-scientist, and my data-set may not be representative of the rest of the dark-web — I’m sure I’ve made plenty of mistakes, and please tell me (constructively) about them in the comments…but it’s been an interesting experience that I hope was worth sharing.

Accessing DNMs isn’t technically complex, but it’s obtuse and unintuitive —obscurity and friction is the real barrier to access. You have to be running TOR, and you can’t just click on the first Google link you find…but it won’t take that much more effort than that either, as there are plenty of sites that will aggregate information on existing markets and how to access them. After exploring a few options, I settled on Pax Romana, described as a “new dark web Marketplace with lots of innovative features to reduce phishing and exit scams” — it had apparently been around since early March, but had accrued a significant customer-base due to novice friendly processes, and security features, and the reassuring picture of Marcus Marcus Aurelius that greeted you at log in.

It was also well structured, with a defined “drugs” category, containing around 200 pages of products. After a bit of tweaking my proxy settings, I had Web Scraper busily crawling through all couple of of hundred pages, which I ran daily from the 21st until the 25th of April.

Current Status of Pax Romana from darknetstats.com

On the 25th of April, Pax Romana was hacked. When I tried to log on on the 26th, I got an error 404, and the website never came back.

My plan to analyse shifting drug prices had been ruined, but I’d also inadvertently recorded the last days of Rome.

User comment from darknetstats.com

Aggregating and Cleaning

With that in mind, I started cleaning and analysing my data-set, which consisted of 10930 rows representing an advert run per day, of which 3149 were unique adverts (broken into daily CSV files).

My main challenge was categorising each advert into a specific substance type — Pax Romana wasn’t categorised beyond “drugs”, so I had to be a little creative to obtain useful data. I used the FuzzyWuzzy python library to iterate over each word in the advert, looking for common drug names.

extract from my drug dictionary

I then used various painful Regex strings to extract views, sales, country of origin and destination, price, seller details, pretty much everything else I could crowbar out without tearing my hair out too much.

I learnt my first lesson right here: drug markets are an absolute mess. Pax Romana was actually quite helpfully designed, with functionality to list price per unit-weight enabling customers to quickly identify worthwhile offers.

Spoilers: most drug dealers don’t like clear pricing structures.

In the vast majority of listings, instead of listing their price using this functionality, the information around the product was packed into the title, leading to eclectic and nigh-on unintelligible product quantity, description, and pricing — I’ve highlighted a few particularly egregious examples.

£15 RELAUNCH SALE! 20 x 3mg Xanax GREEN HULKS (+20% FREE) CHEAPEST ON PAX!

Single advert for large quantities of pill were also quite common — have fun measuring price per volume when you’re accounting for number of pills and the weight per pill. I imagine this is what trying to understand cricket is like.

‘’SALESALE’’ 2000 x ‘XTC’BENTLEY PILLS ‘’240 MG’’ TRACKED

As best I can, I tried to extract pricing information, weight, and pill quantities from all of these adverts. Eventually, I had 6193 entries where I was confident in the drug name (defined as over 85% similar to one of my pre-defined names) and had a price per weight (ideally stated, or if not extracted from the description), created from 1800 unique adverts.

Analysing

I first wanted to get a feel for just how much activity an advert would see, so I calculated the mean number of views per day an advert could expect, for variance substances.

Just under ten views, per day, per advert. Considering the number of adverts on the website, that suggests a decent customer base. How much of that customer base then converts into a sale though?

Not that many, apparently — the average advert had fewer than 0.1 sales per day.

Using a simple linear OLS regression, views seem to be quite a strong predictor of sales (as represented by our R² of 0.6, meaning the number of views seem to account for over half the variance in sales) but the coefficient is only around 0.015, suggesting it takes around an additional 66 views to generate an additional sale.

Lesson 2? It looks pretty damn hard to sell drugs on the internet. That said, averages mask extremes, and I’m sure we’ve all wondered who buys the overwhelming majority of junk available on the internet…if you have enough adverts up, who cares if the conversion rate is low?

With that in mind, I next examined individual sellers: how many ads does your friendly online neighbourhood dealer have, and how many actually make sales?

Turns out, just like in the physical world, most people dealing drugs don’t really make that much cash. Of 95 sellers on the platform, 50% had not made a single sale — no matter what price they set. A few big players were competing for the big bucks, while everyone else fought over the scraps.

A second interesting finding was that most dealers did not sell only one product, and in fact, some sold upwards of ten different substances. This isn’t even different strands of cannabis: this is sellers selling medical opioids, amphetamines, benzos, steroids…there are some seriously industrious actors here. Meanwhile, remember how I highlighted that averages hid extremes? The 3d scatter chart above highlights some of those most successful sellers, who even if they aren’t selling in significant numbers, have a very significant turnover. For instance, one seller had made 62 sales, and had an average price of 5407USD, while another has made 136 sales at around 1730USD.

Where was this money coming from, and where was it going? Obviously I don’t have the capabilities (or intent) to track users, but plenty of adverts listed where the product would be sent from. Although we obviously can’t confirm how accurate these are, they paint an interesting picture.

The USA, Netherlands, Germany, the UK and Spain represent the vast majority of sales, and given sellers have the option of obfuscating their location by listing “WorldWide”, these are probably mostly accurate. Interestingly, even as all these countries went deeper into lockdown, supply has shot up over the course of the week, growing by nearly a third.

Once again though, aggregate mask unique trends. Many countries were dropping at the start of the week, hitting a plateau mid-week before climbing as the weekend approached — in the UK, overall supply shrank, although without more data it’s impossible to say whether this is natural variation or due to the lockdown affecting the wider market. Overall however, supply increases massively — I assume this is due to seasonal variation (perhaps in the run up to the end of the month or the weekend) though it’s possible social distancing may be playing part.

By examining the market breakdown by substances, it looks like this increase isn’t driven by any one product — while the increase isn’t uniform, multiple products see an increase in supply over the course of the week.

To try and understand these shifts, I examined individual country markets in more detail.

I was expecting cocaine, marijuana or heroin to be our drug of choice, but cocaine was just pipped to the post by various Benzodiazepine options (often prescribed to treat anxiety or panic attacks and sold under their commercial names, such as Xanax).

Examining the US graph, prescription opioids are an obvious stand-out — the size of the supply here is staggering, especially when compared to other drugs.

Examining adverts from Netherlands and Germany, I can’t help but think these differences might be driven by large sellers— despite the quite large volumes, it seems too stark to be natural variance.

Next, I examined drug pricing. As I alluded to above, this data is incredibly messy, and I suspect is driven by plenty of adverts intended to drive traffic towards other places rather than make real sales (for example, Whatsapp numbers given in the description). That said, by focusing only on those adverts that had made only one sale, I could focus on price information on the 760 or so unique adverts that had actually sold something at the stated price.

Once again, we notice that incredibly variance, and the influence of large-scale players — while the majority of sales tend to be at or around the $100 mark, some extreme sale are being made far closer to the $1000 price point. I’d hoped that plotting the price per gram make the data clearer, but frankly it just adds to the confusion.

That said, there tends to be a consistent average identified, and I’m confident that with more data we could have plotted pricing variations over time.

So, what are my main takeaways?

  • Online drug markets are intentionally opaque and messy — the vast majority of adverts don’t drive sales, and may in fact be used to drive traffic to more effective platforms.
  • Like dealers in the physical space, the majority probably aren’t making much cash. That said, life is good at the top, with significant players likely raking in vast sums, and dealing in several different drugs.
  • Pricing and quantity purchased vary enormously at the extremes, even amongst similar products. That said, most sales are conducted in the smaller price ranges, and a larger data-set may be able to produce real predictions around drug-market fluctuations.

If you’d like to know more about the code behind these visualisations, or the data itself, they’re available in Jupyter notebook format on my Github. Github doesn’t run the visualisation hugely well, so I’ve also uploaded those to Google Collab below.

--

--

quantitative crime science @ MPS | Coordinator @ Police Rewired | My (personal) thoughts on crime, data, and economics | https://andreasthinks.me/