The objective of Augmented Reality (AR) is to add virtual objects
to real video sequences. In order to make AR system effective, the computer-generated
objects and the real scene must be combined seamlessly so that the virtual
objects align well with the real ones. Realistic merging of virtual and
real objects also requires that objects behave in a physical plausible
manner in the environment : they can be occluded or shadowed by objects
in the scene.
Few of AR systems address the occlusion problem. Theoretically, resolving occlusions amounts to compare the depth of the virtual objects to the depth of the real scene.
|
![]() |
First, the user outlines
the occluding objects in a small set of selected frames (see Figure 1,
a and b). These key frames correspond to views where aspect changes occur,
like the apparition of a new facet of an occluding object.
We build the 3D occluding
boundary of the occluding object from two consecutive key frames (Fig 1.c).
(details).
The projection of this
3D curve is used to predict the 2D occluding boundary in the frames between
the two key views. Since we take into account of the uncertainty on the
viewpoints during the reconstruction and the projection phases, we also
get a region
i around each
point of the predicted boundary, which contains the actual point position
(Fig. 1.d). (details)
The predicted boundary
is refined, under the constraint that the recovered boundary points must
be in their
i region, using
a region-based tracking, and an active contour model (Fig. 1.e). (details)
Figure 1: Overview of the occlusion resolution system
![]() The ![]() |
![]() The recovered occluding boundaries |
![]() Another representation |
![]() A first augmented sequence |
![]() Another augmented sequence |
![]() Another one (just for fun) |
The cow Sequence
This sequence has been used to test our algorithm with a relatively
complex occluding object and a rotating camera. So the apparence of the
occluding object (the cow) changes sensitively over the sequence. A calibration
table has been used to recover the camera trajectory, this way the viewpoints
are almost exact. The three key views were:
![]() Key view 1 |
![]() Key view 2 |
![]() Key view 3 |
![]() The augmented sequence |
The return
of the cow
This sequence differs from the previous sequence by several points:
the viewpoints have been recovered with our hybrid
method (which uses only images features); the camera trajectory is
more general; the occluding object (yes, a cow again) is more complex.
Since the cow paws appear and disappear, we had to define five key views
at the beginning of the sequence. But key-views 5 and 6 are distant.
![]() Key view 1 |
![]() Key view 2 |
![]() Key view 3 |
![]() Key view 4 |
![]() Key view 5 |
![]() Key view 6 |
![]() The augmented sequence |
The Loria
Sequence
In this sequence, the dominant motion of the camera is a translation
along the optical axis. Such a motion is known to be difficult both for
motion recovery and 3D reconstruction, but the refinement stage succeed
in recovering the actual boundary in nearly all cases. However, some problems
arise at the end of the sequence when the light post is going to leave
the image.We used only two key views :
![]() Key frame 1 |
![]() Key frame 2 |
![]() The augmented sequence (4Mo) |
![]() The augmented sequence (frames 240 to 480 - 1.4 Mo) |