Introduction to Snowplow Analytics

Introduction to Snowplow Analytics

A lot of you might have heard of Snowplow Analytics recently. It's making large waves in this brave new world of Big Data. But what does it do and why should you care about it? Managing your own analytics stack sounds hard. Why bother?

In this post, I'm going to try to answer that as best I can!

In other Snowplow Analytics tutorials, I'm going to go step-by-step into how to set up a Snowplow Analytics stack through Terraform. Terraform is a technology I talk about here.

What is Snowplow?

Snowplow is an open-source standard (and set of tools to implement that standard) for collection, processing and storing your own user analytics.

What does Snowplow do?

It allows you to do a number of things without using an outside service. Both creating a standard data format and giving you tools to use that standard.

Collect the Data

This step of the process sounds deceptively simple. Just throw a POST request at an API endpoint. Anyone could do it, right?

Here are some of the things that can get in the way of that simplicity:

  • What if the collection API crashes?

Snowplow has a set of collectors built for stability and high traffic.

  • What data format is the data going to be sent in?

Snowplow has a set standard for data that solves most analytics needs.

  • What languages do we need to access the API from?

Snowplow has a set of analytics trackers coded in many popular languages, ready for production use.

  • If we want clickstream or page view data, what do we build to support that?

Snowplow's javascript tracker supports click tracking and page view tracking out of the box.

  • How do we keep track of the identity of the user between sessions/devices?

Snowplow has everything a normal application needs for identity stitching

Process and Enrich the Data

Do you want to process your analytics in real-time? How about through a SQL-backed dashboard? Is batch processing through Elastic Map Reduce more your style?

Snowplow has you covered, whatever your data processing needs. Snowplow also comes equipped with a wealth of enrichment add-ons that can do things like IP-based location tracking and referral decomposition.

No matter how you want to process your data, now or in the future, Snowplow is flexible enough to get you there!

Store the Data

You can store your enriched data in a number of storage locations:

Why is Snowplow better than competitors…?

There are a ton of tools out there for monitoring analytics. Google Analytics, Mixpanel, Segment… Why use Snowplow?

They are limiting

In this new world of Big Data, where startups are popping up left and right offering machine learning and data mining and automated inbound marketing… Competition in the world of data science and predictive technology can be more aggressive than ever.

The companies that can look into their analytics and use it to help figure out what their customers want can beat their competition by creating products that are much more useful.

These pre-built services are very impressive technology in their own right. But they're not made with your company or your customers in mind. They're not made to solve the problem you're solving or the trends you're looking to spot. They're often solving the general case instead of your case.

They are sticky

When you leave an analytics tool, your data often stays with that analytics tool. If you switch to a new one, your historical data can be lost. This makes switching analytics tools very tough after a couple years worth of data is locked in.

Or… They are expensive

Services that fill a niche use-case and don't lock you in are usually very expensive to go along with it. Charging you for the amount of data going through your system, whether it's useful or not. Charging for the amount of data they store for you… These flexible tools also often end up being a similar amount of work to if you managed your own data.

Snowplow is open

Snowplow is completely open source. You can read it, modify it, contribute to it. You can build tools on top of it or use the wealth of community made tools that already exist. You store your data in any format you choose and can access it through any UI you can think of.

Snowplow is non-binding

When using Snowplow, your data is your own. You can do with it what you please and keep it for however long you'd like. If you later choose to move to another analytics platform, your data doesn't go anywhere and you may even be able to load historical data into the new platform.

Snowplow is free

Being completely open source, Snowplow is free. The only thing you pay for are the (pretty lightweight) servers you host it on, the data store you choose to use, and the time of whoever manages it. You can manage petabytes of analytics information with a relatively cheap software stack. Or you can use the streaming infrastructure and throw away any data that's too old for your purposes. The choice is yours

Bonus: Real-time Predictive Analytics

Ever want to set up your analytics so you can not only see what's happened in the past but predict what will happen in the future? You can do that if you have the right experts managing it and reading the data. It's a new field and there's not too many services that can boast being effective at it.

I'm convinced. What's next?

Get started, looking into Snowplow!

In my next post, I detail how to set up a Snowplow Streaming Collector in Terraform!

In future posts, I'm going to go step-by-step into how to set up a Snowplow Analytics stack through Terraform. Terraform is a technology I talk about here.

If you'd like to be kept up to date as I write new things about Snowplow and Terraform, subscribe to this blog!

Intro to Snowplow - The Snowplow Collector

Intro to Snowplow - The Snowplow Collector

Abstraction through modules in Terraform

Abstraction through modules in Terraform