Using OpenStreetMap data in PowerQuery

PowerBI | PowerQuery

This blog post is about using XML data from OpenStreetMap in PowerQuery for example to create a local geocoding database. PowerQuery is just perfect for this purpose as it allows us to interactively shape the input data to the desired format.

First, we need to download some data from OpenStreetMap. You may go to the OpenStreetMap webpage, zoom to the desired area and then hit the export button. Alternatively, download links are provided for full country or region files after hitting ‘Export’.


The downloaded OSM file is in XML format and starts like this:

<?xml version=”1.0″ encoding=”UTF-8″?>
<osm version=”0.6″ generator=”CGImap 0.3.3 (733” copyright=”OpenStreetMap and contributors” attribution=”” license=””>

Now, open Excel and create a new PowerQuery from an XML file:


Next, select the downloaded file and hit Enter. PowerQuery loads with the file contents:


The data we’re interested in is stored in the nodes-table. So we can now click on the ‘Table’ link in the nodes column:


We now have to expand the ‘tag’ information:


Since we’re not interested in all the data from the file, we limit the information to country, city, postcode, street and housenumber by using the filter function:


We can now delete all columns but the following:

  • tag.attribute.k
  • tag.attribute.v
  • attribute:id
  • attribute:lat
  • attribute:long



This is pretty much the data we’d like to obtain from the OSM file. However, we still need to pivot to information from the rows into columns to see city, postcode etc. side by side with the geo coordinates. In order to do so, we select column tag.Attribut:k and use the Pivot function from the Transform ribbon as shown below:


Finally, we can rename the columns and filter for rows where country is not null:


The list now contains addresses (city, postcode, street, house number) together with the corresponding geographical coordinates (latitude, longitude).

For your reference, this is the script we created so far:

    Source = Xml.Tables(File.Contents(“C:\Temp\download.osm”)),
    node = Source{0}[node],
    #”Changed Type” = Table.TransformColumnTypes(node,{{“Attribute:id”, Int64.Type}, {“Attribute:visible”, type logical}, {“Attribute:version”, Int64.Type}, {“Attribute:changeset”, Int64.Type}, {“Attribute:timestamp”, type datetime}, {“Attribute:user”, type text}, {“Attribute:uid”, Int64.Type}, {“Attribute:lat”, type number}, {“Attribute:lon”, type number}}),
    #”Expand tag” = Table.ExpandTableColumn(#”Changed Type”, “tag”, {“Attribute:k”, “Attribute:v”}, {“tag.Attribute:k”, “tag.Attribute:v”}),
    #”Filtered Rows” = Table.SelectRows(#”Expand tag”, each ([#”tag.Attribute:k”] = “addr:city” or [#”tag.Attribute:k”] = “addr:country” or [#”tag.Attribute:k”] = “addr:housenumber” or [#”tag.Attribute:k”] = “addr:postcode” or [#”tag.Attribute:k”] = “addr:street”)),
    #”Removed Columns” = Table.RemoveColumns(#”Filtered Rows”,{“Attribute:visible”, “Attribute:version”, “Attribute:changeset”, “Attribute:timestamp”, “Attribute:user”, “Attribute:uid”}),
    #”Pivoted Column” = Table.Pivot(#”Removed Columns”, List.Distinct(#”Removed Columns”[#”tag.Attribute:k”]), “tag.Attribute:k”, “tag.Attribute:v”),
    #”Renamed Columns” = Table.RenameColumns(#”Pivoted Column”,{{“Attribute:id”, “id”}, {“Attribute:lat”, “latitude”}, {“Attribute:lon”, “longitude”}, {“addr:city”, “city”}, {“addr:postcode”, “postcode”}, {“addr:street”, “street”}, {“addr:country”, “country”}, {“addr:housenumber”, “housenumber”}}),
    #”Filtered Rows1″ = Table.SelectRows(#”Renamed Columns”, each ([country] = “DE”))
    #”Filtered Rows1″

The result can then be loaded to a PowerPivot data model (click on File, then Load to…):


After loading the data into the model, you can for example use PowerMaps to visualize the addresses on a map:


Of course, you could have received the similar result using the geocoding functionality of PowerMaps. Therefore, you simply need to add another column for the full address like this:


The result is pretty much the same but it takes a significant amount of time to geo code many addresses on this level of detail compared to the direct approach from above, which doesn’t need to pass geo coding web service because we already provided latitude and longitude.


So, if you have to geo code a lot of addresses, the OpenStreetMap XML format may be a lot of help. Using PowerQuery makes it easy to load this data into a PowerPivot data model. For larger regions make sure you have enough memory available in the local machine (and use the 64bit version of Microsoft Excel).

This entry was posted in Allgemein. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s