- Source: Spatial embedding
Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.
Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks
Embedded data types
Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses.
= Text
=Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques.
= Image
=Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region.
= Point
=A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph.
= Line / multiline
=Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization.
= Polygon
=The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents.
= Graph
=An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network.
Usage
POI recommendation - generating personalized point of interest recommendations based on user preferences.
Next/future location prediction - prediction of the next location a person will go to based on their historical trajectory.
Zone functions classification - based on different mobility of people or POI distribution a function of a given area in a city can be predicted.
Crime prediction - estimation of crime rate in different regions of a city.
Local event detection - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location.
Regional mobility popularity prediction - analysis of mobility can show patterns in popularity of different regions in a city.
Shape matching - finding a similar shape of given polygon, for example finding building with the same shape as input building.
Travel time estimation - predicting estimated travel time given current traffic conditions and special occurring events.
Time estimation for on-demand food delivery - estimation of delivery time when placing an order through the website.
Temporal aspect
Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.