Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Town duplicates and missing county info? #120

Closed
amnesia7 opened this issue Oct 21, 2014 · 21 comments
Closed

Town duplicates and missing county info? #120

amnesia7 opened this issue Oct 21, 2014 · 21 comments
Milestone

Comments

@amnesia7
Copy link
Contributor

I've just tried doing a search for chorley using http://photon.komoot.de/ (the request was actually http://photon.komoot.de/api/?q=chorley&limit=5&lat=52.38796340538239&lon=13.058280944824217) which returned 2 entries both listed as Chorley, United Kingdom (administrative).
The response also included 2 entries listed as Chorley, United Kingdom (village) that are both different villages but the response gives no information to differentiate between them until you see where they are on the map.

Are the 2 administrative entries duplicates?
Is there anything that can be done about duplicates like this from the api point of view?
I could filter the response before showing them in the suggestions list (eg request limit=10, remove duplicates, output the first 5) but I wonder if the response shouldn't be actually including them anyway.

Is there some info missing with respect to the 2 village entries in the list, eg county name, to give the user more of a clue which is which?

@lonvia
Copy link
Collaborator

lonvia commented Oct 21, 2014

There are two administrative entries for Chorley: http://www.openstreetmap.org/browse/relation/148744 and http://www.openstreetmap.org/browse/relation/1414821. You might want to adopt Nominatim's approach to boundary description: use the place designation if there is a linked place or translate the admin_level into something human understandable. Naturally that would require to export the linked place somehow.

For the rather short description of , : it might be worth having a look at the human readable address formats they are doing over at opencage data: https://github.com/lokku/address-formatting. I could imagine that we import different address information (e.g. state or county or sometimes both) depending on which country the object is in.

@christophlingg
Copy link
Member

I came across the address-formatting project yesterday and this is really great!

@amnesia7
Copy link
Contributor Author

I've just been having a play with the demo on the OpenCage site (http://geocoder.opencagedata.com/demo.html) and it looks pretty good.

However, it doesn't seem to distinguish between the two Chorley villages very well whereas the search results on this page (http://www.openstreetmap.org/search?query=Chorley) returns the fact that one is Chorley, Cheshire East, North West England, England, United Kingdom and the other is Chorley, Shropshire, West Midlands, England, United Kingdom.

@amnesia7
Copy link
Contributor Author

The following URL returns 2 entries for the same information for the Phones 4u Arena in Manchester:
http://photon.komoot.de/api/?q=manchester%20phones%204%20u%20arena&limit=5
The only difference in them appears to be the osm_key. One says osm_key: "tourism", the other says osm_key: "leisure" so I think either one of these should be filtered out or the osm_key should return a comma-separated list of keys or something where there are more than one.

@christophlingg Any news on including a more complete address in the API response so that the examples in #120 (comment) are easier to distinguish?

@christophlingg
Copy link
Member

nominatim creates multiple entries for a place that has multiple meanings (tourism and leisure) in your example. we could group that as a comma-separated list. Do you think it is urgent, did you encounter this very often?

In regard to your duplicates: every item has a country information, state information is coming in the next version v0.2. So I guess all informations will be available to distinguish items with equal names.

@amnesia7
Copy link
Contributor Author

I don't think it's that urgent because I haven't really found any other examples although it will be easier to distinguish between duplicates when the extra address info arrives in v0.2.
I just thought I'd mention these duplicates before I forgot.
Just found; it seems to happen for Sydney Opera House as well : http://photon.komoot.de/api/?q=sydney%20opera%20hous&limit=2

@amnesia7
Copy link
Contributor Author

@christophlingg it's great that you've added state to the api response but for the above queries they are still indistinguishable because the state returned is state: "England".
Is there any sign of county being added to the response as well so that they are shown differently in the response?
Photon: http://photon.komoot.de/api?q=chorley&limit=3
Nominatim (without extra address details): http://open.mapquestapi.com/nominatim/v1/search.php?format=json&addressdetails=0&limit=3&q=chorley
Nominatim (with extra address details): http://open.mapquestapi.com/nominatim/v1/search.php?format=json&addressdetails=1&limit=3&q=chorley

@christophlingg
Copy link
Member

Thanks for your feedback, did not know about nominatim's feature of extracting county information. It should not be too hard to include it, however I assume it is not as urgent as it was for the state attribute. It also seems that mapping coverage of county is not that good: http://taginfo.openstreetmap.org/tags/?key=place&value=county#map

@lonvia
Copy link
Collaborator

lonvia commented Feb 16, 2015

Counties are mostly tagged as administrative boundary with admin level 6.

@amnesia7
Copy link
Contributor Author

@christophlingg county is very important for distinguishing between locations in the UK.
I didn't actually realise until it was implemented but the county field is probably more important than state for UK locations. I thought UK county names would've been in the state field until I found out there was a county field as well.
As I highlight above there are plenty of places with the same name but are located in different counties so they are difficult to distinguish between.
It looks to be pretty important for america as well:
http://photon.komoot.de/api?q=springfield+virginia
http://open.mapquestapi.com/nominatim/v1/search.php?format=json&addressdetails=1&q=springfield+virginia&limit=15

@amnesia7
Copy link
Contributor Author

@christophlingg I was just wondering if there was any chance of county being added to the response (where it exists) with it being important (at least for UK and USA locations).
I've just had another look at OpenCage Geocoder (uses the address-formatting project) that was mentioned above and the demo includes the county in the address output.

@christophlingg christophlingg added this to the 0.3.0 milestone Mar 30, 2015
@christophlingg
Copy link
Member

It is a good idea and could help users. We need to elaborate first how big the impact of a new field (search performance and index size).

@amnesia7
Copy link
Contributor Author

Cool.
Let me know if you need me to try it out or anything once you've checked the impact of adding it.
I know it would make uk addresses far more useful that are returned (at least) so I'm hoping it's ok for you to add it.

@amnesia7
Copy link
Contributor Author

In the meantime, if anyone is looking to remove duplicates like Phones 4u Arena in Manchester it can be done using:

function mergeJson (json) {
    //invalid object
    if (!json || !json.features) {
      return;
    }
    var features = json.features;
    var added = {};
    for (var i = 0, l = features.length; i < l; i++) {
      var o = features[i].properties;
      if (!o) {
        continue;
      }
      var ref = o.osm_type + '~' + o.osm_id;
      if (added.hasOwnProperty(ref)) {
        if (added[ref].osm_value.indexOf(o.osm_value) === -1) {
          added[ref].osm_value = added[ref].osm_value + ', ' + o.osm_value;
        }
        //remove the object and subtract and size
        json.features.splice(i--, 1), l--;
      } else {
        added[ref] = o;
      }
    }
  }

mergeJson(json);

which will use a comma separated list for osm_values of duplicates, stadium, attraction

@amnesia7
Copy link
Contributor Author

@christophlingg

We need to elaborate first how big the impact of a new field (search performance and index size).

If I add the county name into my search query photon does return a resultset that is filtered down as expected, matching those in that county, so I assume county is in the search index already (and in use).
Does that mean that it just needs county adding to the resultset that are returned for the query and shouldn't actually affect the index size anyway?

@amnesia7
Copy link
Contributor Author

@christophlingg just wondered if there was any news about a release date or progress update on milestone 0.3.0

@christophlingg
Copy link
Member

sorry @amnesia7 I am quite busy with other stuff which leaves me with no time for photon at the moment

@amnesia7
Copy link
Contributor Author

That's a shame, photon was making really good progress. Hurry back.

@amnesia7
Copy link
Contributor Author

@christophlingg just wondered whether you are working on photon again yet to be able to add county to the response which already appears to be part of the index anyway because when searching and including the county in the query the results look to be filtered down as expected.

@amnesia7
Copy link
Contributor Author

@karussell would county field not be of use with GraphHopper when selecting start and end locations? Particularly when searching UK and USA locations.

@lonvia
Copy link
Collaborator

lonvia commented Apr 6, 2021

API now returns county. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants