Scalable Country Localization
cloc is self-contained Java library able to do country localization based on geohashes.
The project puts together discretized borders in a trie data structure.
Gradle
compile 'io.github.adrianulbona:cloc:0.3.2'
Maven
<dependency>
<groupId>io.github.adrianulbona</groupId>
<artifactId>cloc</artifactId>
<version>0.3.2</version>
</dependency>
Code sample - Java
final CountryLocator countryLocator = CountryLocator.create();
final List<String> countries = countryLocator.locate("u10hb1"); // ["United Kingdom"]
Code sample - Scala - Spark
val locator = spark.sparkContext.broadcast(CountryLocator.create())
val locate = udf { (geohash: String) => locator.value.locate(geohash).asScala }
val pointsDF: DataFrame = Seq(
Point("u10hb1", 51.47, 0.00),
Point("u33ff3", 52.52, 13.81)).toDF
pointsDF.withColumn("countries", locate($"geohash"))
.show()
// +-------+-----+-----+----------------+
// |geohash| lat| lon| countries|
// +-------+-----+-----+----------------+
// | u10hb1|51.47| 0.0|[United Kingdom]|
// | u33ff3|52.52|13.81| [Germany]|
// +-------+-----+-----+----------------+
Sources
Feel free to come with suggestions, the source code can be explored at github/adrianulbona/cloc.