By Orly Amsalem, data scientist at Cisco
Who is Stealing Credentials?
Imagine a hacker. You’re probably visualizing a blurred image in a white mask or a black hoodie, typing lines of code, determined to hack into secure sites, steal credit cards, and collect valuable, previously private data.
Now imagine a video pirate. You’re probably thinking of the same type of shady character, stealing video content and sharing it illegally for profit of in the name of a major cause, right? Well, guess what: There are other forms of video piracy that occur in the general population. It could be the neighbor next door, it could happen among friends or work colleagues, or among family members. It could actually happen to me and you.
It goes like this: You sign up for an Over-the-top (OTT) video service, and get your own login and password. Later, you share your credentials with a friend. Isn’t that ok?
Research has shown that the extent of the sharing phenomenon in the market is large. For example, a Reuters/Ipsos poll cited in a recent Reuters article, has discovered that that one-fifth of young adults that access IP video services, are using the credentials of someone else who does not live with them. Password sharing for TV streaming apps can turn into a profitable yet illegal business, since illegal accounts can be purchased for one tenth the cost of a legit account. The revenue loss to service providers is huge.
Data is our Weapon
So, what can we do? How can we fight theft that seems so innocuous?
Here at Cisco, we decided to tackle this problem, by bringing Data Science into the scene.
IP-based video delivery systems, like our Infinite Video Platform (IVP), collect masses of data. Terabytes of data stream into our systems and are stored in our data centers, creating a treasure trove of information. Using Machine Learning, we are helping our customers turn data into desired business results.
Our goal: To detect illegal credential sharing. Our weapon: Data.
What Are We Looking for?
We started by taking raw subscriber account logs. Each log documents all actions generated by all of the accounts. In the logs, we’re looking at information such as account identification, a unique identifier for the device, the time of action, the content that was consumed, and a few more data points. It may sound like very little, but for the goal we set out to achieve, that’s all it takes.
We then laid out some initial assumptions on how illicit password sharing behavior could be detected in the data. For example:
- The same account being accessed from two distant locations in a short timeframe
- A marked change in genre preferences over time, or
- A larger than reasonable number of devices used to access a single account
But, a deep investigation into the data showed us that one feature is not enough—for best results, we need to consider a large set of features and the way they interact.
The first phase in any data science task is to do some cleaning. We cleansed, transformed and enriched the data, and extracted more features that we found relevant for the analysis. Now our data was ready.
Next, we modeled the behavior of each account using the processed data, and set a threshold for what counts as “normal” behavior. For example, it is reasonable that a given household will have 2-3 devices per resident. If a household exceeds that number, it is flagged as suspicious.
We used a normalization of the Mahalanobis distance to determine the level of deviation from the normal behavior. This is our “Sharing Score”, ranging from 0‑1000. The bigger the deviation, the bigger the Sharing Score—or the likelihood of sharing behavior.
The next phase was to refine the results, and categorize accounts based on the extent of the sharing that is occurring. We differentiated between casual sharing that occurs within a family or between friends, and business sharing, that is used to turn a profit.
We expanded our model to support this enhancement. Motivated by graph theory, we created dynamic graphs, which model the account behavior over time. These graphs show us, for example, if an account is activated from different locations repeatedly over time, which is suspicious behavior, or if it’s an occasional single access from a remote location that likely signifies service access during a trip. We created a decision tree based on the graphs, which outputs a prediction as to whether a certain account is demonstrating casual sharing behavior, business sharing behavior, or no sharing at all.
Armed with all these insights on account sharing behavior, service providers can take control and define their own policy of action. They can draw their own Sharing Score threshold windows for which they will address the sharing accounts, for starters. They can also establish the specific response actions, for example sending a challenge question to verify the authenticity of the user, offering a promotional package to entice sharing users into becoming subscribers, forcing a password change, or suspending the account.
Detecting credential sharing is challenging. There’s a thin line between finding a real sharer and harassing a customer. Our model is an effective tool that uses science to differentiate between casual “here’s my password,” and not-so-casual “let’s share this account password among many people and make a hefty profit.”
Want to hear more? Join us in the IBC Paper Session: Novel Ideas and Cutting Edge Technologies on Sunday, September 17 at 10:15. Or, come see us at booth #1.A71.
Read more about TV streaming app credential sharing:
- Article: On solving for the vast gray area that is credential sharing
- E-book: The silent threat: Protect your business from the risks posed by credential sharing