Data Source Problems
Public Sources of Statistics
There are various publicly available sources for data on the DFZ and networks inside the DFZ. The problem is that everyone has a different view of the DFZ, there is no one single DFZ and thus everyone has a slightly different idea about the topology of rest of the Internet.
Lets take an example well connected network, AS13335 (Cloudflare), and look at different public providers of connectivity stats. At the time of writing:
bgp.he.net shows the following:
BGP Peers Observed (all): 1,597
BGP Peers Observed (v4): 1,527
BGP Peers Observed (v6): 744
bgp.tools shows the following:
Peers Upstreams Downstreams
2237 290 748 (Cone: 749)
asrank.caida.org shows the following:
customer cone
728 asn
15255 prefix
18762102 address
AS degree
1361 global
1043 transit
282 provider
352 peer
727 customer
As we can see, there are some similarities but also some discrepancies. So it’s tricky to use data from these sources to extract the information needed.
Public MRT Sources
Another option is to download the MRT files from route collectors, which have BGP sessions with networks of interest. This brings a different problem, the majority of networks in the world do not peer with public route collectors like RouteViews and RIS, or private collectors like bgp.tools.
Using BGP Kit one can match ASNs to route collectors:
The following table shows the list of the ASNs of interest, and which route collector has a full table feed from each ASN (some ASNs might provide a full table feed to multiple collectors, but only one is listed):
| ASN | Name | Tier | Collected | Collector | Notes |
|---|---|---|---|---|---|
| 174 | Cogent | 1 | RRC25 | ||
| 701 | Verizon/UUNET | 1 | None | route-views6 receives only v6 routes, no v4. | |
| 1273 | Vodafone | 2 | None | RRC03 receives v6 only. | |
| 1299 | Arelion | 1 | RRC01 | ||
| 2635 | Automattic | 2 | None | Not peering with RIS or RV. | |
| 2914 | NTT | 1 | RRC01 | ||
| 3257 | GTT | 1 | RRC01 | ||
| 3320 | DTAG | 1 | RRC01 | ||
| 3356 | Lumen/Colt | 1 | None | Not peering with RIS. route-views receives v4 only, no v6, but and doesn’t export to MRT. | |
| 3491 | PCCW/Console Connect | 1 | RRC01 | ||
| 4637 | Telstra | 2 | None | RIS receives partial table only. Same on Route Views. | |
| 5511 | FT/Orange | 1 | None | RIS receives partial table only. Same on Route Views. | |
| 6453 | TATA | 1 | RRC03 | ||
| 6461 | Zayo | 1 | RRC01 | ||
| 6762 | TI Sparkle/Seabone | 1 | RRC12 | ||
| 6830 | Liberty Global | 1 | RRC01 | ||
| 6939 | Hurricane Electric | 1* | RRC01 | ||
| 7018 | AT&T | 1 | RRC00 | ||
| 7195 | EdgeUno | 2 | None | RIS receives partial table only. Same on Route Views. | |
| 7473 | Singapore Telecommunications | 2 | None | Not peering with RIS or RV. | |
| 9002 | RETN | 2 | RRC01 | ||
| 9498 | Bharti Airtel | 2 | None | Not peering with RIS or RV. | |
| 12389 | Rostelecom | 2 | None | Not peering with RIS or RV. | |
| 12956 | Telxius/Telefonica | 1 | RRC03 | ||
| 13335 | Cloudflare | 2 | None | RIS receives partial table only. Same on Route Views. | |
| 13786 | Seaborn (Seabras USA) | 2 | RRC15 | ||
| 14840 | BR.DIGITAL | 2 | RRC25 | ||
| 20473 | The Constant Company / Vultr | 2 | None | RIS receives partial table only. Same on Route Views. | |
| 23911 | China Next Generation Internet Beijing | 2 | None | Not peering with RIS or RV. | |
| 24482 | SG.GS | 2 | RRC00 | ||
| 35280 | F5 Networks | 2 | RRC15 | ||
| 36236 | NetActuate | 2 | RRC25 | ||
| 37468 | Angola Cables | 2 | napafrica | ||
| 37721 | Virtual Technologies & Solutions | 2 | None | RRC00 receives full v6 table but 857k v4 routes. | |
| 38255 | China Education and Research Network | 2 | None | Not peering with RIS or RV. | |
| 39120 | Convergenze | 2 | amsix | ||
| 49544 | i3D.net | 2 | RRC00 | ||
| 52320 | GlobeNet Cabos Submarinos Colombia | 2 | RRC16 | ||
| 57463 | NetIX Communications | 2 | None | route-views2 has partial v4 table and no v6. route-views6 has no v4 but full table v6. RRC10 partial table only. | |
| 64289 | Macarne | 2 | sfmix | ||
| 199524 | G-Core | 2 | RRC03 |
The table shows that all the data needed can’t be pulled from direct peerings with public route collectors.
Proxy ASNs
For some of the ASNs of interest there is no data available via public MRT archives (some of the ASNs do not peer with any RouteViews or RIPE collectors, or they peer with the collectors which don’t export MRTs, or even if they do peer they don’t send their full BGP table to the collectors). In this case it might be possible to get the data for those ASNs another way.
By finding another ASN (the term “proxy ASN” is used here) which does send a full BGP table to a public BGP collector, where the proxy ASN is a downstream of the ASN of interest, and that proxy ASN has no other upstream (in other words, it’s single homed to the ASN of interest); the full table export from the proxy ASN is essentially a full table export of the ASN of interest.
The following is the list of ASNs of interest for which there is no MRT full table data available:
| ASN | Name |
|---|---|
| 701 | Verizon/UUNET |
| 1273 | Vodafone |
| 2635 | Automattic |
| 3356 | Lumen/Colt |
| 4637 | Telstra |
| 5511 | FT/Orange |
| 7195 | EdgeUno |
| 7473 | Singapore Telecommunications |
| 9498 | Bharti Airtel |
| 12389 | Rostelecom |
| 13335 | Cloudflare |
| 20473 | The Constant Company / Vultr |
| 23911 | China Next Generation Internet Beijing |
| 37721 | Virtual Technologies & Solutions |
| 38255 | China Education and Research Network |
| 57463 | NetIX Communications |
Finding Proxy ASNs via Caida
Caida has data which can be used to find suitable proxy ASNs.
mkdir -p raw_data/downstreams
asns="701 1273 2635 3356 4637 5511 7195 7473 9498 12389 13335 20473 23911 37721 38255 57463"
Get a list of AS relations between all ASNs in the DFZ:
# You must sign the following form if you plan to publish any research based on Caida's data:
# https://www.caida.org/catalog/datasets/request_user_info_forms/as_relationships/
wget -O raw_data/downstreams/20241101.as-rel2.txt.bz2 https://publicdata.caida.org/datasets/as-relationships/serial-2/20241101.as-rel2.txt.bz2
bunzip2 raw_data/downstreams/20241101.as-rel2.txt.bz2
For each ASN of interest, find all the downstreams which are singled homed to that ASN of interest:
for asn in $asns
do
./find_single_homed_asns.py raw_data/downstreams/20241101.as-rel2.txt $asn
done
Get a list of peers for all RouteViews and RIS collectors:
# Parse through jq to have nice output formatting
curl https://api.bgpkit.com/v3/broker/peers | jq > raw_data/downstreams/bgpkit.peers.json
For each single homed ASN, check if it is exporting a full table feed to a collector:
for asn in $asns
do
echo "Checking single homed downstreams of $asn"
jq -c '.[]' "raw_data/downstreams/${asn}_sh_downstreams.json" | while read downstream_asn
do
./scripts/find_collector_mrt_for_asn.py raw_data/bgpkit.peers.json "$downstream_asn"
done | jq ".|select(.[]!={})"
echo ""
done
After looking through the results and choosing proxy ASNs which are exporting a full table to one of the public MRT sources, the table now looks like this:
| ASN | Name | Proxy ASN | Proxy Collector |
|---|---|---|---|
| 701 | Verizon/UUNET | 54316 | RRC25 |
| 1273 | Vodafone | 12969 | RRC01 |
| 2635 | Automattic | None | |
| 3356 | Lumen/Colt | 13830 | RRC00 |
| 4637 | Telstra | None | |
| 5511 | FT/Orange | None | |
| 7195 | EdgeUno | 64116 | RRC24 |
| 7473 | Singapore Telecommunications | None | |
| 9498 | Bharti Airtel | None | |
| 13335 | Cloudflare | None | |
| 12389 | Rostelecom | None | |
| 20473 | The Constant Company / Vultr | None | |
| 23911 | China Next Generation Internet Beijing | None | |
| 37721 | Virtual Technologies & Solutions | None | |
| 38255 | China Education and Research Network | None | |
| 57463 | NetIX Communications | None |
Proxy ASN Problems
It’s hard to know the relationship between two ASNs. There are problems with the Caida data used to get ASN relationship. For example, it states that AS9002 (RENT) is single homed behind 3356 (Lumen) which is incorrect. At the time of writing AS1299 (Arelion) and AS3257 (GTT) are both providing transit to AS9002 (checked from a router CLI with full table feeds from 1299 and 3257). There are lots of mistakes like this.
The proxy ASN method still leaves a lot of data missing and it’s not clear it would work when considering the errors in the Caida data. Therefore, the proxy ASN method isn’t suitable to “fill in the blanks”.
PCH Collectors
In addition to the RouteViews and RIS collectors which provide Public MRT dumps, PCH also operate a large number of collectors, which provide BGP table dumps in a rather horrible, unstructured output, that isn’t even consistent across collectors!
But again, PCH alone doesn’t have feeds from all the networks of interest.