The Institute for the Learning Sciences
Northwestern University
Evanston, IL 60201
hinrichs,
bareiss,
slator@ils.nwu.edu
The system's task is therefore to construct a sufficient description of the user's client such that relevant stories can be retrieved. The system must elicit this description from a novice who presumably does not know the domain as an expert would. Consequently, the system must do everything it can to guide the user in describing his client. This guidance takes three main forms in ORCA:
The following dialog illustrates a typical session with ORCA:
ORCA: Orca (the Organizational Change Advisor) will assist you in analyzing a client's situation as you hear stories and answer questions. Is this a new client or an existing one? User: New Client. ORCA: What is the client's name? User: RustBelt Mfg. ORCA: Choose one or more change drivers, and then press continue to answer questions on behalf of the client. User: (selects Competitive Threats and Restructuring/Reorganization from a menu.) ORCA: Choose an industry type that best describes your client. User: (selects Manufacturing from a menu.) ORCA: Now it is time to answer some questions about your client. Is it time that the organization con- sidered updating its processes? User: Yes ORCA: Is planning considered an important and worthwhile effort in aiding employees with their job requirements? User: Yes ORCA: Does the organization allot extensive training time to its employees? User: No ORCA: Does the organization spend sufficient time and effort researching its change options? User: Yes ORCA: Is a large project effort planned? User: YesAt this point ORCA tells a story about a previous consulting job in which an aerospace defense contractor reorganized its factory floor in order to be more efficient and competitive. An overview of the story is presented first, with follow-up questions leading to more specific stories about how the shop floor was organized, how jobs were redesigned, and how the manufacturing process flow was refined to eliminate unnecessary steps. After viewing the story, the user is asked to compare the client to the airframe manufacturer on a feature-by-feature basis in order to more accurately retrieve additional stories.
Each of these criteria contributes to the quality of story retrieval, and each criterion has implications for story indexing. In ORCA, our assumption has been that the activation level of a story would be a measure of its similarity to the current client, and that the similarity of a story would serve as a measure of its relevance. The above definition provides a way to evaluate this assumption.
The first criterion suggests that there may be qualitative differences between features; features that are abstract or thematic may be more important for retrieval than other types of features. In ORCA, for example, a story tends to be on-point or analogous if it is a good exemplar of a proverb that it shares with the client's problem description. (We treat proverbs as abstract categories of `business diseases'.) In ORCA, a story must bear both surface and thematic similarity to the client before that story will be told. This rule suppresses the telling of stories that are off-point, either due to vacuous surface similarity in the absence of thematic coherence, or because of opaque abstract similarity. If the telling of a story depended simply on the linear sum of the activation of its features, then there would be no way to distinguish between stories whose activation was due to a central primary proverb and those with no proverb at all.
Another potential problem is ``populism'', in which a high level of activation may be due to a preponderance of weakly confirmed sources. Stories that are weakly related to everything may overshadow more specific stories that have fewer, but more strongly confirmed, sources. Such `promiscuous' stories are often very general, and consequently may be less interesting or useful when they follow on the heels of more specific stories. One solution to this problem is to impose extra discipline on the indexing process, such that stories are labeled more conservatively and weak indexing links are ruthlessly excised. Another solution is to implement an algorithm for `aging' the activation levels in memory, so that aggregations of weak remindings would lose reminding strength over time. A third possible solution is to avoid adding general stories to the system in the first place. However, it is not clear that general stories are useless, nor that the order of their presentation is significant. Consequently, ORCA requires that stories be thematically similar to the current problem, as well as strongly activated. Because activation levels do not distinguish between thematic and non-thematic features, we conclude that activation, by itself, is not always a good measure of similarity.
The second criterion for relevance is that a story should provide some kind of advice or moral. For example, a story about a company with an insufficiently trained workforce should either help to anticipate future problems or show how they solved their problems. What would not be as useful would be a story that merely says ``Here's another company that's just like yours.'' We have assumed that ORCA would retrieve stories that were similar to the description of the client. Because stories are indexed directly by their descriptions, this means that stories must be described in terms of the problems they address. If stories are encoded that do not explicate both a problem and its solution (as is the case, unfortunately, with some of the ORCA stories), the indexer must plausibly reconstruct the problem and describe the story in terms of that problem. If stories are encoded that do not distinguish between the problem and the solution, then the system may not be able to distinguish between stories that are relevant to the user's problem and those that are not. For this reason, similarity, by itself, is not always a good measure of relevance.
The problem/solution distinction is a special case of the more general issue of representational granularity and scope. Granularity denotes the distinctions that are drawn in a representation (such as problem vs. solution). Scope denotes the extent of the world that is represented (such as whether or not a problem is explicitly represented in a case). For example, a story might be about a company that does an excellent job in assimilating new technology. In ORCA, this story is most likely to appear when the client is also excellent in this regard, since ORCA operates on the premise that similar stories are relevant. This is an assumption that arises when the domain is complex, as in change management. In rich, weak-theory domains of this type, no two stories are ever identical. Stories are retrieved on the basis of two levels of similarity, but they are important because of their ability to bring new features into focus. The similarities between stories is what makes them analogous, the differences between similar stories is what makes them illustrative.
On the other hand, one could argue that such a story might best be told in response to a client whose problem is outdated or misused technology. That is, to retrieve on the basis of counter-example. This notion turns out to be difficult to capture with a homogeneous spreading activation network. A more complex predicate style representation would enable this functionality, as well as enabling the representation of mutually exclusive alternatives. A simpler, but more ad hoc solution would be to introduce into the representation a method for explicit counter-example linking between stories. Neither of these has been implemented in ORCA, and the efficacy of this trade-off is an empirical question.
One implication of this is that the truth of a propositional feature is not necessarily the same as its relevance. When associating ``Yes'' answers with positive activation there is danger of conflating truth with relevance. For example, when asking the question: ``Does the company provide sufficient training?'' a negative answer should be highly relevant, while a positive answer means that training is not a concern. This is a common problem with spreading activation and propositional features. At least two solutions are possible: The first solution is to increase the representational power (and complexity) of the system and explicitly represent the valence of a feature. Features for which truth and relevance are opposite would have an associated negative multiplier to invert the activation they pass. The ORCA solution is simply to carefully choose the feature vocabulary such that truth always corresponds to relevance. For example, a better question would be ``Does the company neglect training?'' In this case, a positive reply would emphasize stories about training. We are in the process of refining our representational vocabulary to ensure this.
A question may be difficult to answer for two reasons: 1) The question may be ambiguous or vague, or 2) The answers may be too similar to discriminate easily. In either case, the user will end up deliberating over possible answers. To avoid such deliberation, the indexing features should be at an appropriate level of abstraction and should embody a vocabulary that is familiar to the user. Proverbs, for instance, are notoriously difficult to interpret. Rather than ask business consultants to describe their clients in terms of proverbs, ORCA associates each proverb with a set of domain-specific surface level features called proverb-confirming features. These features are easier for the user to recognize than the proverb itself. For example, one proverb in the system is: ``The good foreman must be a good carpenter first.'' To confirm this proverb, ORCA asks the following questions: ``Do the organization's leaders take a `hands on' approach?'' and ``Do the managers understand their employees' jobs?''. If the answer to either question is yes, the proverb is confirmed.
The second reason for user deliberation is that the answers may be too similar to discriminate easily. The user should not have to make difficult decisions to distinguish between alternatives that are either very similar or don't count much. For example, a question such as ``Has productivity, efficiency, or quality emerged as a major concern?'' will be true to some extent of any organization. Rather than forcing a fine-grained distinction, it might be better to either re-word the question or to disable the ``Probably'' and ``Maybe'' buttons. In general, small differences in judgement should not lead to large differences in behavior. For this reason, the difference between ``Probably'' and ``Maybe'' in terms of activation is kept quite small in ORCA.
The overall efficiency of case retrieval is also determined by the number of questions that the user must answer. The user should not have to answer questions that appear self-evident. The system should, at the very least, draw rudimentary inferences. To combine inference and association, we define three additional types of features and their corresponding inferences:
In discussing efficiency, we have assumed that the objective is to make it easy for users to describe their problem. A case could be made, however, that deliberating over the description could be a learning experience equally valuable to hearing stories. Forcing users to think is usually a good idea in pedagogical systems. Nevertheless, we believe the user should invest his effort into interpreting his problem, rather than interpreting the meaning of features or second guessing the system, and this places a particular premium on posing questions at a level the user can easily understand and answer.
The entire ORCA story base was constructed manually, on a story by story basis. Although there was a pre-existing schema for characterizing cases in terms of surface features, there was no theory of how these features related to each other, nor any notion of the abstract categories exemplified by proverbs in ORCA, nor how these abstract categories related to the surface features. Thus, indexing each story was done in discrete steps. The first step involved reading the story and assigning it labels from the domain. In general, this could be done at a rate of 2-4 stories an hour. However, building the memory at the same time added considerably to the indexing effort, particularly at first. Later stories were added with less theory building, as the memory matured, but nonetheless, the effort of building this story base turned out to be quite high. This was partly because indexing stories and designing the representation were combined in the same process. Each step contributes to the difficulty of indexing. This was also because of the lack of good tools for manipulating memory. It goes without saying that cases should be described with as small a vocabulary as possible. It is a much harder thing to say how small that is. An over-large vocabulary forces the indexer to wade through many similar but often subtly different features. However, in complex domains there are quite often important differences that can seem subtle but are not. The feature set in ORCA was quite large by some measures, containing hundreds of features, but it had been adapted from a pre-existing vocabulary of organizational change, and it was feared that pruning could only be accomplished at the risk of loss in generality. In general, the vocabulary of the domain is imposed by the domain itself, and can only be modified with expertise of that domain near at hand.
To the extent possible, the case-base should also be monotonic, or additive. The addition of new cases should not interfere with the retrieval of previously indexed cases. However, this property is difficult or impossible to preserve when the domain theory is being built incrementally. In ORCA, for example, most of the difficulty of indexing consisted of defining the reminding links between features and deciding how strong these links should be.
One approach to this problem is to declare that links have clear, precise semantics. This could reduce the amount of deliberation the indexer devotes to interpreting the meaning of features and links. There are several ways that links could be interpreted:
Any ``upgrading'' of the representational machinery, beyond statistical frequency data which could be automatically generated, would involve considerable overhead in terms of representational complexity and would come at the expense of the simple associative properties of the memory. As originally conceived, the ORCA representation was based on association and reminding, rather than any semantic properties. This minimalist experimental assumption was an attempt at managing the complexity of a domain that does not admit of concretized rules or simple solutions. No representational upgrade has been implemented in ORCA, and the efficacy of this trade-off is another empirical question.
We hope to determine how the issues of accuracy, efficiency, and difficulty manifest themselves for different methods of case retrieval. To do this, we are in the process of building case retrievers based on a more general spreading activation model that integrates inference and association, another retriever based on the PROTOS model [Bareiss, 1989], and a third retriever based on variations of discrimination nets, as implemented in CYRUS [Kolodner, 1984]. We expect that a better understanding of the interaction between a control strategy and an indexing strategy will help us to more easily build accurate and efficient case bases.
We have also been extending ORCA to support knowledge acquisition through failure-driven refinement of the case base. This process involves fine-tuning the indexing structure based on user feedback during problem solving To enable the user to provide feedback, ORCA must provide a means by which the user can interrupt the system at any point and criticize its behavior. Currently, ORCA allows an expert to reject a story or question as ``irrelevant''. It then takes the expert into a knowledge acquisition dialog that explains the reasons why it was reminded of a story or question and proposes strategies to repair the case base. Much work remains to be done to refine this capability and to extend the coverage of indexing problems that can be repaired.