Can ISPs NetFlow data be used to track traffic going through VPNs?
Basic
Solène Rapenne
Introduction
This privacy guide will help you understand what information your Internet Service Provider (ISP) can view regarding your network activity and the implications if you are using a Virtual Private Network (VPN). In fact, many ISPs utilize NetFlow, a protocol developed by Cisco, to store the data concerning the traffic they route throughout the day.
NetFlow allows the storage and efficient processing of network information including:
- Date and Time with millisecond resolution
- Source IP address
- Destination IP address
- IP protocol number (most common protocols are TCP and UDP)
- Source port
- Destination port
- IP field “Type of Service”
A NetFlow dataset does not include any packets capture data, it simply presents a list of the routing information, including the source and destination’s addresses and ports, and when the routing occurred.
Is a VPN vulnerable to NetFlow analysis?
As you may be aware, a VPN creates an encapsulated connection between your VPN client and the VPN server. All the network traffic between these two machines travels over the Internet in encrypted form, and your ISP can’t use classic techniques such as Deep Packet Inspection for snooping the VPN content.
Nonetheless, it’s crucial to note that, besides the encrypted data, your ISP obtains a lot of information about your VPN from the NetFlow data. As the VPN service providers IP ranges are well known, your ISP can easily figure you are using a VPN, in addition to knowing the time you connect, the amount of data you transfer over the VPN and the location of the remote VPN server.
Although this information may seem insignificant, it can be exploited. For instance, it’s easy to determine the timing of your device usage, potentially the number of people in your house, and gather insights about how these people use the Internet.
Please note that it’s impossible to hide your network activity from your ISP, as they are the ones who provide your connection to remote servers, but if you use a VPN, your ISP will only see a single encrypted connection.
Internet is a giant puzzle
The Internet could be compared to a vast puzzle composed of many pieces, each representing an ISP. Every ISP has knowledge of its own part of the puzzle and the connections to other pieces.
If your VPN service provider is located in a different part of the puzzle than your own ISP, this means that your actions through the VPN cannot be accurately determined by your ISP. Similarly, the ISP of the VPN server has no way of identifying you using only your IP address, they would need to collaborate with your ISP to identify you.
Worldwide NetFlow database
Unfortunately, a 2022 article from Vice has revealed that a US-based private company has been collecting NetFlow exports from many ISPs worldwide in exchange for Threat Intelligence analysis. As per the article, the number of involved ISPs suggests that it may represent roughly ninety percent of the global Internet traffic. Information about Team Cymru, the company that sells access to the consolidated NetFlows database, remains limited. It was found that their website contains a list of facts and myths about their services, though their claims cannot be verified. Nevertheless, it is evident that they are working on NetFlow aggregation.
Using the puzzle analogy again, Team Cymru has access to most of the puzzle pieces. While a single piece doesn’t hold enough information in the context of using a VPN, having many of them could potentially expose your Internet usage if they receive NetFlow exports from both your ISP and your VPN provider ISP. For example, traffic correlation using the packets timing becomes a lot easier when you know the delay between the user and their VPN provider acting as a proxy.
In 2024, the NSA stated to a U.S. senator that they were buying NetFlow exports from ISPs as long as it involves traffic to or from the United States.
NetFlow and anonymization
It is not possible to say which ISPs share their NetFlow data.
For European based ISPs, the GDPR compliance dictates that personal data should not be shared. It is not our place to discuss if NetFlow datasets qualify as our personal data, but GDPR compliance implies that ISPs must not permit any third party to associate a NetFlow export with personal information such as names, addresses, or phone numbers.
Two possibilities exist regarding NetFlow exports:
- ISPs sharing anonymized NetFlow datasets.
- ISPs sharing customer information.
In the first case, the network activity for a VPN user would appear as follows: [anonymous IP A] connected to [anonymous IP B] web server on [date] via a VPN of type [protocol] on [anonymous IP C].
In the second case, the network activity would be much more specific: [identified person A] connected to [identified company B]’s web server on [date] through a VPN of type [protocol] on [identified company C].
Possible mitigation
With someone able to view most of the global Internet traffic, as previously mentioned, a VPN alone would be insufficient to protect your privacy. Does it render VPNs useless? VPNs are effective at protecting against data snooping while using public networks, bypassing firewalls or preventing your ISP from knowing what you use the Internet for, but its efficacy can be limited against a state-level actor.
In the worst case scenario of a NetFlow analysis, the data passing through the VPN remains encrypted and unusable, but it may be possible to reveal to which servers you connected, the protocol you used (HTTPS, emails etc.) and maybe infer visited websites.
However, using a VPN service offering multiple hops passing through different countries/ISPs can still protect your privacy, but only if the ISPs do not all share their NetFlow data. In order to increase the NetFlow analysis resistance, at the cost of both higher latency and reduced bandwidth, it’s possible to chain multiple VPNs from different VPN providers, but on the condition that the VPN providers are trustworthy and that their servers aren’t all part of the NetFlow exports.
To protect your privacy efficiently, it’s important that you define your threat model and check if it suits your needs.
An alternative mitigation would be to use the I2P protocol, although its usage is a bit restrictive, contrary to Tor. I2P is intended to be used as a “network layer on top of the Internet”, rather than a substitute for a VPN. For more information, visit the official project website.
A more realistic mitigation would be the use of a mix network, but as of the time of writing, those available on the market are complicated to use and require blockchain tokens to work. We also lack feedback about their efficiency in a real world usage.
Exercise: monitor your own network activity
For our readers with some network skills, here is a simple experiment to understand what your ISP can observe from your VPN usage. You can assess your own VPN activity by monitoring the network traffic on your local VPN interface using software such as Wireshark (a graphical tool), ntopng (web-based, mostly used on routers) or tcpdump (a command line tool). These software are available on most operating systems (Windows, macOS, Linux, Android, dd-wrt, *BSD), however their usage is not within the scope of this guide.
Next Articles
Here to learn?
Consult our guides for increasing your privacy and anonymity.
IVPN Privacy GuidesSuggest an edit on GitHub.