135
The comparison of two P-sequences can be performed purely by examining the addresses
they contain. Since segments represent contiguous portions of the address space, the com-
parison process need only examine the initial addresses of the relevant segments. The proc-
ess must examine every segment that appears in either of the versions.
The simplest method of comparison is first to generate a list of all the segments in one of
the documents, in the order that they occur. The path portions of the addresses of each of
these segments must be normalized for comparison, so that corresponding addresses will be
equal. Paths that consist only of insertions remain intact, because data that is shared be-
tween sequences should correspond. Paths with initial move operations need them removed
from the path, since data corresponds regardless of what move operations may have affected
it. Because copy operations should only count as a match when they are identically present
in both sequences , copy operations should not be removed from paths. Furthermore, move
operations in the source of copy operations should also be preserved; they correspond to
moves that were executed in the origin of the copy, and that already match in that context.
Any addresses that are equal after the removal of initial move operations represent le-
gitimate correspondences between the two versions. Any addresses present in one version
but not the other represent deleted data. While this basic set of correspondences is ade-
quate, an application might well prefer to sort the address ranges by position of occurrence
in one of the sequences and to merge pairs that are adjacent in both orders, so as to have
correspondences of maximum length.
6.8 Summary
This chapter has reviewed the simpler change-oriented system VTML 2.0, an extension of
joint work undertaken with Fabio Vitali (Vitali and Durand 1995). Creating a VTML imple-
mentation has several issues similar to those involved in a Palimpsest implementation. The
management of a fragmented sequence with a global ordering is the most significant simi-
The comparison of two P-sequences can be performed purely by examining the addresses
they contain. Since segments represent contiguous portions of the address space, the com-
parison process need only examine the initial addresses of the relevant segments. The proc-
ess must examine every segment that appears in either of the versions.
The simplest method of comparison is first to generate a list of all the segments in one of
the documents, in the order that they occur. The path portions of the addresses of each of
these segments must be normalized for comparison, so that corresponding addresses will be
equal. Paths that consist only of insertions remain intact, because data that is shared be-
tween sequences should correspond. Paths with initial move operations need them removed
from the path, since data corresponds regardless of what move operations may have affected
it. Because copy operations should only count as a match when they are identically present
in both sequences , copy operations should not be removed from paths. Furthermore, move
operations in the source of copy operations should also be preserved; they correspond to
moves that were executed in the origin of the copy, and that already match in that context.
Any addresses that are equal after the removal of initial move operations represent le-
gitimate correspondences between the two versions. Any addresses present in one version
but not the other represent deleted data. While this basic set of correspondences is ade-
quate, an application might well prefer to sort the address ranges by position of occurrence
in one of the sequences and to merge pairs that are adjacent in both orders, so as to have
correspondences of maximum length.
6.8 Summary
This chapter has reviewed the simpler change-oriented system VTML 2.0, an extension of
joint work undertaken with Fabio Vitali (Vitali and Durand 1995). Creating a VTML imple-
mentation has several issues similar to those involved in a Palimpsest implementation. The
management of a fragmented sequence with a global ordering is the most significant simi-