no-alt-text-from-niantic-02.png
https://wizard.casa/media/7fd7d18a57072aa87015c85f610b4978dad2997a4f3ecfcde2cdf1a9d1407d56.png
!!NOT AN ENDORSEMENT!!
## Human-Like Understanding
The process described above is similar to how humans perceive and imagine the world. As humans, we naturally recognize something we’ve seen before, even from a different angle. For example, it takes us relatively little effort to back-track our way through the winding streets of a European old town. We identify all the right junctions although we had only seen them once and from the opposing direction. This takes a level of understanding of the physical world, and cultural spaces, that is natural to us, but extremely difficult to achieve with classical machine vision technology. It requires knowledge of some basic laws of nature: the world is composed of objects which consist of solid matter and therefore have a front and a back. Appearance changes based on time of day and season. It also requires a considerable amount of cultural knowledge: the shape of many man-made objects follow specific rules of symmetry or other generic types of layouts – often dependent on the geographic region.
While early computer vision research tried to decipher some of these rules in order to hard-code them into hand-crafted systems, it is now consensus that such a high degree of understanding as we aspire to can realistically only be achieved via large-scale machine learning. This is what we aim for with our LGM. We have seen a first glimpse of impressive camera positioning capabilities emerging from our data in our recent research paper MicKey (2024). MicKey is a neural network able to position two camera views relative to each other, even under drastic viewpoint changes.
{!IMAGE}
MicKey can handle even opposing shots that would take a human some effort to figure out. MicKey was trained on a tiny fraction of our data – data that we released to the academic community to encourage this type of research. MicKey is limited to two-view inputs and was trained on comparatively little data, but it still represents a proof of concept regarding the potential of an LGM. Evidently, to accomplish geospatial intelligence as outlined in this text, an immense influx of geospatial data is needed – a kind of data not many organizations have access to. Therefore, Niantic is in a unique position to lead the way in making a Large Geospatial Model a reality, supported by more than a million user-contributed scans of real-world places we receive per week.
GNU social JP is a social network, courtesy of GNU social JP管理人. It runs on GNU social, version 2.0.2-dev, available under the GNU Affero General Public License.
All GNU social JP content and data are available under the Creative Commons Attribution 3.0 license.