New frontiers in artificial intelligence JSAI 2008 by Hiromitsu Hattori, Takahiro Kawamura, Tsuyoshi Ide, Makoto Yokoo, Yohei Murakami

This publication contains award papers from the twenty second Annual convention of the japanese Society for man made Intelligence, held in Asahikawa, Japan, in June 2008 and chosen papers from 3 co-located foreign workshops.

The quantity starts off with eight award successful papers of the JSAI 2008 major convention that have been chosen between greater than four hundred shows. they're observed by means of 18 revised complete workshop papers, conscientiously reviewed and chosen from 34 shows on the following 3 co-located foreign workshops: common sense and Engineering of typical Language Semantics (LENLS 2008), the 2d overseas Workshop on Juris-Informatics (JURISIN 2008), and the first foreign Workshop on Laughter in interplay and physique circulation (LIBM 2008).

However, if we assume the subtree is solved by a centralized MDP (in which the current state is fully observable), we cannot estimate the new synchronized belief state after communication. Thus, we assign default policies to agents whose policies are not assigned yet and estimate the new synchronized belief state after communication assuming these agents use the default policies. We can use these default policies also for evaluating the expected reward for the current k steps. In this case, the heuristic function is no longer admissible, but it can prune more nodes and the run-time can be reduced.

However, the number of joint (small) policies grows exponentially to the length of the time horizon. To overcome this problem, we introduce an idea that resembles the Point-based Value Iteration (PBVI) algorithm [9] for single agent POMDPs. More specifically, we use a fixed number of representative belief points and compute the k-step optimal joint policy for each representative belief point. By using a fixed number of representative belief points, the obtained policy can be suboptimal. However, as shown in [9], we can bound the the difference between the obtained approximated policy and the optimal policy.

Based on the area it is scanning, each sensor receives observations that can have false positives and false negatives. Sensors’ observations and transitions are independent of each other’s actions. Each agent incurs a scanning cost whether the target is present or not, but no cost if it is turned off. There is a high reward for successfully tracking a target. 1 Existing Algorithms LID-JESP The locally optimal policy generation algorithm called LID-JESP (Locally interacting distributed joint search for policies) is based on DBA [10] and JESP [3].

