Implementation
To capture the geometric structure of a room, application users fulfil an interaction similar to capturing a panoramic photograph - considered to be relatively familiar to many potential operators.
Standing approximately in the centre of the space, the user points the device camera at key features located at ground level, namely floor-wall corners and points along the floor-wall intersection (see Figure 1). As the user rotates, denoting key features, a 2D plan view polygon is constructed.
For each point denoted by the user, the device orientation (azimuth, pitch and roll) is computed according to the device’s accelerometer gravity vector, Magnetometer and gyroscope. Two-dimensional coordinates of points with respect to the device are then estimated using trigonometry, assuming a fixed, user selected, device height across all points.
During the capture process the user also collects at least one pair of corresponding ceiling-level and floor-level points from which the room height can be estimated, again using trigonometry. A point submission also triggers the capture of a photograph for association with the location.
Finally, the model is geographically located by the user with translation, rotation and scaling actions against Google Maps imagery layer. The resulting points are extruded to form a 2.5D polygon.
This simple modeling approach contains a variety of sources of error based on several key assumptions: an accurate height measurement for the device, an unobstructed view of key features, a single-level floor and ceiling and a reasonably accurate orientation estimate.
These issues contribute to irregularities apparent in the resultant 2D polygon.
To compensate for them we regularize to a Manhattan-world where all key points are orthogonal to or parallel with each other using constrained least-squares.