Towards ip geolocation with intermediate routers according to topology discovery
Contents
- Network measurement-based
- Network measurement-based
- Data mining-based
- Summary
- 25 Phrases Every English Intermediate Learner Must-Know
There are a number of geolocation methods since it was initially freely discussed by Padmanabhan and Subramanian (2001). They think that IP addresses within same Autonomous System (AS) or with low latencies are geographically near to one another. It’s the prerequisite from the methods they suggested: GeoTrack, GeoPing and GeoCluster. GeoCluster extracts Border Gateway Protocol (BGP) data from public routers and pins all hosts in subnet to the position of the organization that owns the correlated AS. GeoTrack and GeoPing use traceroute and ping to determine network constraints (delay and topology) and convert these to geographical constrains. Inspired by these techniques, IP geolocation methods are split into two groups: network measurement-based and knowledge mining-based.
Network measurement-based
CBG.Gueye et al. (2006) propose a constraint-based geolocation (CBG) according to GeoPing. GeoPing constructs latency vector to focus on host using vantage points. It pins target location of the landmark using the nearest latency vector. Rather of latency vector and pinning, CBG uses geographical distance and multilateration to discover target host. The thought of CBG extends geolocation derive from landmarks to continuous geographical space. CBG uses “bestline” to lessen error created by inflated latency and indirect routes when converting network constraints to geographical distance. However, bestline estimation continues to be too loose (Katz-Bassett et al. 2006) even when compared with speed-of-light constraint.
TBG.Katz-Bassett et al. (2006) think that measurement results vary with network atmosphere, so that they introduce topology constraints and propose a topology-based geolocation (TBG). TBG combines network topology with latency constraints and computes locations of target and intermediate routers concurrently with global optimization formula. TBG proves that topology improves geolocation precision. However, the technique requires more computing time since it takes all nodes happened in pathways.
Octant.Wong et al. (2007) propose an over-all framework, known as Octant, that mixes latency measurement, topology calculation and host name comprehension. Much like TBG, Octant locates intermediate nodes within the path to target with multilateration and introduces these nodes as secondary landmarks to assist locate the next nodes. Octant extends CBG’s multilateration with negative constraints and convex hulls which result in better geolocation precision. Octant achieves the cheapest geolocation error by utilizing network measurements only, however it faces exactly the same problem that TBG has. Both of them take all nodes into account, plus they rely on sufficient active hosts to geolocate target hosts.
Rather of direct distance constraints, some record methods according to network measurement are suggested. Youn et al. (2009) use maximum likelihood according to distance vectors to estimate target location. Eriksson et al. (2010) choose Naive Bayes Classification rather.
There are a number of geolocation methods since it was initially freely discussed by Padmanabhan and Subramanian (2001). They think that IP addresses within same Autonomous System (AS) or with low latencies are geographically near to one another. It’s the prerequisite from the methods they suggested: GeoTrack, GeoPing and GeoCluster. GeoCluster extracts Border Gateway Protocol (BGP) data from public routers and pins all hosts in subnet to the position of the organization that owns the correlated AS. GeoTrack and GeoPing use traceroute and ping to determine network constraints (delay and topology) and convert these to geographical constrains. Inspired by these techniques, IP geolocation methods are split into two groups: network measurement-based and knowledge mining-based.Network measurement-based
CBG.Gueye et al. (2006) propose a constraint-based geolocation (CBG) according to GeoPing. GeoPing constructs latency vector to focus on host using vantage points. It pins target location of the landmark using the nearest latency vector. Rather of latency vector and pinning, CBG uses geographical distance and multilateration to discover target host. The thought of CBG extends geolocation derive from landmarks to continuous geographical space. CBG uses “bestline” to lessen error created by inflated latency and indirect routes when converting network constraints to geographical distance. However, bestline estimation continues to be too loose (Katz-Bassett et al. 2006) even when compared with speed-of-light constraint.
TBG.Katz-Bassett et al. (2006) think that measurement results vary with network atmosphere, so that they introduce topology constraints and propose a topology-based geolocation (TBG). TBG combines network topology with latency constraints and computes locations of target and intermediate routers concurrently with global optimization formula. TBG proves that topology improves geolocation precision. However, the technique requires more computing time since it takes all nodes happened in pathways.
Octant.Wong et al. (2007) propose an over-all framework, known as Octant, that mixes latency measurement, topology calculation and host name comprehension. Much like TBG, Octant locates intermediate nodes within the path to target with multilateration and introduces these nodes as secondary landmarks to assist locate the next nodes. Octant extends CBG’s multilateration with negative constraints and convex hulls which result in better geolocation precision. Octant achieves the cheapest geolocation error by utilizing network measurements only, however it faces exactly the same problem that TBG has. Both of them take all nodes into account, plus they rely on sufficient active hosts to geolocate target hosts.
Rather of direct distance constraints, some record methods according to network measurement are suggested. Youn et al. (2009) use maximum likelihood according to distance vectors to estimate target location. Eriksson et al. (2010) choose Naive Bayes Classification rather.
Gill et al. (2010) attack delay-based geolocation system by governing the network qualities. The authors reveal limitations of existing measurement-based geolocation techniques given an adversarial target. They discover that the greater advanced and accurate topology-aware geolocation techniques tend to be more prone to covert tampering compared to simpler delay-based techniques.
Data mining-based
Structon.Guo et al. (2009) think it is achievable to gather numerous landmarks using web mining. The authors propose a technique that mines geographical information from webpages and affiliate IP addresses of web sites using these data. Structon pins other hosts without geographical information to landmarks much like GeoCluster, to ensure that most answers are still coarse-grained. Though Structon geolocates hosts at city level, it’s a motivation for all of us to gather plenty of landmarks.
SLG.Wang et al. (2011) present an excellent-grained geolocation way in which combines web mining and network measurement. The authors suggest that the precision of IP geolocation is heavily determined by the density of landmarks. SLG uses multilateration (just like CBG) to contract confidence region (around 100 km), that is convincing because delay is difficult constraints (Katz-Bassett et al. 2006). Within narrowed region, it collects web servers as landmarks online map service. SLG uses traceroute to determine relative delay between target and landmarks as new constraints. Relative delay is the sum of the two path delays begin with the final router of the common path. With fine-grained landmarks and more powerful constraints, SLG seems to lessen the average magnitude of error from 100 to 10 km. While SLG pins target towards the “nearest” (using the tiniest relative delay towards the target) landmark, this could limit the precision of location estimation. There’s two reasons:
-
1.
In the area with moderately connected Internet, the correlation between network latency and geographical distance doesn’t fit the “shortest-closest” rule that is demonstrated to rely on numerous samples (Li et al. 2013). Additionally, it introduces heavy network traffic.
-
2.
The increasing of cloud services and content delivery systems (CDN) reduces the amount of qualified landmarks and for that reason influences the precision of geolocation.
DRoP.Huffaker et al. (2014) propose a DNS-based approach to search and geolocate a sizable group of routers with hostnames. They think that each autonomous website name that utilizes geographical hints (geohints) consistently within that domain. They will use data collected using their global measurement system (Archipelago 2007) to create geohints of nodes inside the same domain. The authors have the ability to generate 1711 rules covering 1398 different domains. While their method are only able to achieve city-level precision due to the limit of geohints from routers.
Summary
Additionally towards the above, many researchers also propose their ideas. Liu et al. (2014) mine check-in data from social systems. They have the ability to locate IP addresses utilized by active users. Laki et al. (2011) propose a record model that associates network latencies to geographical distance range and employ maximum likelihood to estimate most possible location. Gharaibeh et al. (2017) test precision of router geolocation in commercial database with ground truth dataset according to DNS and latency measurements. The authors condition the databases aren’t accurate in geolocating routers at neither country- nor city-level, even when they agree considerably among one another. Weinberg et al. (2018) use active probing to geolocate proxy servers.
The condition-of-the-art methods mostly are according to accurate and fine-grained landmarks (extracted by name comprehension, e.g. DNS, website, online map). However, you may still find some challenging problems:
-
1.
Hosts with fine-grained answers are mainly stable or active, for example college servers and pc users. However, geolocation errors of individuals dynamic/inactive hosts are large. This is because most landmarks collected from the web are usually self-clustered and shut to active hosts. There’s still some of static but inactive hosts with low landmark distribution, e.g. edge routers, backbone switches, etc. While there aren’t any existing techniques to extend landmark density, geolocation outcomes of these hosts still improvement.
-
2.
There’s a dilemma between time overhead and geolocation precision. When we introduce more landmarks for greater precision, it’ll extend time overhead. Real-time geolocation is much more difficult due to the necessity of numerous landmarks.
Resourse: https://cybersecurity.springeropen.com/articles/10.1186/