Network Working Group                                         J. Ordille
Request for Comments: 2258                Bell Labs, Lucent Technologies
Category: Informational                                     January 1998
        
Network Working Group                                         J. Ordille
Request for Comments: 2258                Bell Labs, Lucent Technologies
Category: Informational                                     January 1998
        

Internet Nomenclator Project

互联网命名项目

Status of this Memo

本备忘录的状况

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

本备忘录为互联网社区提供信息。它没有规定任何类型的互联网标准。本备忘录的分发不受限制。

Copyright Notice

版权公告

Copyright (C) The Internet Society (1998). All Rights Reserved.

版权所有(C)互联网协会(1998年)。版权所有。

Abstract

摘要

The goal of the Internet Nomenclator Project is to integrate the hundreds of publicly available CCSO servers from around the world. Each CCSO server has a database schema that is tailored to the needs of the organization that owns it. The project is integrating the different database schema into one query service. The Internet Nomenclator Project will provide fast cross-server searches for locating people on the Internet. It augments existing CCSO services by supplying schema integration, more extensive indexing, and two kinds of caching -- all this in a system that scales as the number of CCSO servers grows. One of the best things about the system is that administrators can incorporate their CCSO servers into Nomenclator without changing the servers. All Nomenclator needs is basic information about the server.

互联网命名项目的目标是整合来自世界各地的数百个公共CCSO服务器。每个CCSO服务器都有一个数据库模式,该模式是根据拥有它的组织的需要定制的。该项目正在将不同的数据库模式集成到一个查询服务中。互联网命名项目将提供快速的跨服务器搜索,以便在互联网上查找用户。它通过提供模式集成、更广泛的索引和两种缓存来扩展现有的CCSO服务——所有这些都是在一个随着CCSO服务器数量增长而扩展的系统中实现的。该系统最好的一点是,管理员可以在不更改服务器的情况下将其CCSO服务器合并到命名器中。命名者需要的是关于服务器的基本信息。

This document provides an overview of the Nomenclator system, describes how to register a CCSO server in the Internet Nomenclator Project, and how to use the Nomenclator search engine to find people on the Internet.

本文档概述了命名系统,描述了如何在Internet Nomenclator项目中注册CCSO服务器,以及如何使用Nomenclator搜索引擎在Internet上查找用户。

1. Introduction
1. 介绍

Hundreds of organizations provide directory information through the CCSO name service protocol [3]. Although the organizations provide a wealth of information about people, finding any one person can be difficult because each organization's server is independent. The different servers have different database schemas (attribute names and data formats). The 300+ CCSO servers have more than 900 different attributes to describe information about people. Very few common attributes exist. Only name and email occur in more than 90% of the servers [4]. No special support exists for cross-server searches, so searching can be slow and expensive.

数百个组织通过CCSO名称服务协议提供目录信息[3]。尽管这些组织提供了大量关于人员的信息,但由于每个组织的服务器都是独立的,因此很难找到任何一个人。不同的服务器具有不同的数据库模式(属性名称和数据格式)。300多台CCSO服务器有900多个不同的属性来描述人的信息。公共属性很少存在。超过90%的服务器中只出现姓名和电子邮件[4]。不存在对跨服务器搜索的特殊支持,因此搜索速度慢且成本高。

The goal of the Internet Nomenclator Project is to provide fast, integrated access to the information in the CCSO servers. The project is the first large-scale use of the Nomenclator system. Nomenclator is a more general system than a white pages directory service. It is a scalable, extensible information system for the Internet.

互联网命名项目的目标是提供对CCSO服务器中信息的快速、集成访问。该项目是第一次大规模使用命名系统。Nomenclator是一个比白页目录服务更通用的系统。它是一个可扩展的互联网信息系统。

Nomenclator answers descriptive (i.e. relational) queries. Users can locate information about people, organizations, hosts, services, publications, and other objects by describing their attributes. Nomenclator achieves fast descriptive query processing through an active catalog, and extensive meta-data and data caching. The active catalog constrains the search space for a query by returning a list of data repositories where the answer to the query is likely to be found. Meta-data and data caching keep frequently used query processing resources close to the user, thus reducing communication and processing costs.

命名器回答描述性(即关系性)查询。用户可以通过描述人员、组织、主机、服务、出版物和其他对象的属性来定位有关这些对象的信息。Nomenclator通过活动目录和广泛的元数据和数据缓存实现快速描述性查询处理。活动目录通过返回可能找到查询答案的数据存储库列表来限制查询的搜索空间。元数据和数据缓存将频繁使用的查询处理资源保持在用户身边,从而降低通信和处理成本。

Through the Internet Nomenclator Project, users can query any CCSO server, regardless of its attribute names or data formats, by specifying the query to Nomenclator (see Figure 1). Nomenclator provides a world view of the data in the different servers. Users express their queries in this world view. Nomenclator returns the answer immediately if it has been cached by a previous query. If not, Nomenclator uses its active catalog to constrain the query to the subset of relevant CCSO servers. The speed of the query is increased, because only relevant servers are contacted. Nomenclator translates the global query into local queries for each relevant CCSO server. It then translates the responses into the format of the world view.

通过Internet Nomenclator项目,用户可以通过指定对Nomenclator的查询来查询任何CCSO服务器,无论其属性名称或数据格式如何(见图1)。Nomenclator提供了不同服务器中数据的世界视图。用户在此世界视图中表达其查询。如果答案已被前一个查询缓存,则Nomenclator会立即返回答案。否则,Nomenclator将使用其活动目录将查询约束到相关CCSO服务器的子集。查询速度提高了,因为只联系相关的服务器。命名器将全局查询转换为每个相关CCSO服务器的本地查询。然后,它将响应转换为世界视图的格式。

   --------------------------------------------------------------------
        
   --------------------------------------------------------------------
        
                     +-------------+             +-------------+
                     |             |             |             |
         World View  |             | Local View  |             |
         Query       |             | Query       |  Relevant   |
         ----------->|             |------------>|             |
                     | Nomenclator |             |  CCSO       |
                     |             |             |             |
         <-----------|             |<------------|  Server     |
          World View |             |  Local View |             |
          Response   |             |  Response   |             |
                     +-------------+             +-------------+
        
                     +-------------+             +-------------+
                     |             |             |             |
         World View  |             | Local View  |             |
         Query       |             | Query       |  Relevant   |
         ----------->|             |------------>|             |
                     | Nomenclator |             |  CCSO       |
                     |             |             |             |
         <-----------|             |<------------|  Server     |
          World View |             |  Local View |             |
          Response   |             |  Response   |             |
                     +-------------+             +-------------+
        

Figure 1: A Nomenclator Query

图1:命名查询

Nomenclator translates queries to and from the language of the relevant CCSO servers.

Nomenclator将查询转换为相关CCSO服务器的语言,并从中转换查询。

   --------------------------------------------------------------------
        
   --------------------------------------------------------------------
        

The Internet Nomenclator Project makes it easier for users to find a particular CCSO server, but it does not send all queries to that server. When Nomenclator constrains the search for a query answer, it screens out irrelevant queries from ever reaching the server. When Nomenclator finds an answer in its cache, it screens out redundant queries from reaching the server. The server becomes easier to find and use without experiencing the high loads caused by exhaustive and redundant searches.

Internet Nomenclator项目使用户更容易找到特定的CCSO服务器,但它不会将所有查询发送到该服务器。当Nomenclator限制查询答案的搜索时,它会将不相关的查询从服务器中筛选出来。当Nomenclator在其缓存中找到答案时,它会过滤掉到达服务器的冗余查询。服务器变得更易于查找和使用,而不必经历由穷举和冗余搜索造成的高负载。

The Internet Nomenclator Project creates the foundation for a much broader heterogeneous directory service for the Internet. The current version of Nomenclator provides integrated access to CCSO and relational database services. The Nomenclator System Architecture supports fast, integrated searches of any collection of heterogeneous directories. The Internet Nomenclator Project can be enhanced to support additional name services, or provide intergated query services for other application domains. The project is starting with CCSO services, because the CCSO services are widely available and successful.

Internet命名器项目为Internet更广泛的异构目录服务创建了基础。Nomenclator的当前版本提供了对CCSO和关系数据库服务的集成访问。Nomenclator系统架构支持对任何异构目录集合进行快速、集成的搜索。Internet Nomenclator项目可以得到增强,以支持其他名称服务,或为其他应用程序域提供集成查询服务。该项目从CCSO服务开始,因为CCSO服务广泛可用且成功。

Section 2 describes the Nomenclator system in more detail. Section 3 explains how to register a CCSO server as part of the project. Section 4 briefly describes how to use Nomenclator. Section 5 provides a summary.

第2节更详细地描述了命名系统。第3节说明如何将CCSO服务器注册为项目的一部分。第4节简要介绍了如何使用命名器。第5节提供了一个总结。

2. Nomenclator System
2. 命名系统

Nomenclator is a scalable, extensible information system for the Internet. It supports descriptive (i.e. relational) queries. Users locate information about people, organizations, hosts, services, publications, and other objects by describing their attributes. Nomenclator achieves fast descriptive query processing through an active catalog, and extensive meta-data and data caching.

Nomenclator是一个可扩展的互联网信息系统。它支持描述性(即关系性)查询。用户通过描述人员、组织、主机、服务、出版物和其他对象的属性来定位有关这些对象的信息。Nomenclator通过活动目录和广泛的元数据和数据缓存实现快速描述性查询处理。

The active catalog constrains the search space for a query by returning a list of data repositories where the answer to the query is likely to be found. Components of the catalog are distributed indices that isolate queries to parts of the network, and smart algorithms for limiting the search space by using semantic, syntactic, or structural constraints. Meta-data caching improves performance by keeping frequently used characterizations of the search space close to the user, thus reducing active catalog communication and processing costs. When searching for query responses, these techniques improve query performance by contacting only the data repositories likely to have actual responses, resulting in acceptable search times.

活动目录通过返回可能找到查询答案的数据存储库列表来限制查询的搜索空间。目录的组件是将查询隔离到网络部分的分布式索引,以及通过使用语义、语法或结构约束限制搜索空间的智能算法。元数据缓存通过将频繁使用的搜索空间特征保持在靠近用户的位置,从而降低了活动目录通信和处理成本,从而提高了性能。在搜索查询响应时,这些技术通过只联系可能有实际响应的数据存储库来提高查询性能,从而获得可接受的搜索时间。

Administrators make their data available in Nomenclator by supplying information about the location, format, contents, and protocols of their data repositories. Experience with Nomenclator shows that gathering a small amount of information from data owners can have a substantial positive impact on the ability of users to retrieve information. For example, each CCSO administrator provides a mapping from the local view of data (i.e. the local schema) at the CCSO server to Nomenclator's world view. The administrator also supplies possible values for any attributes with small domains at the data repository (such as the "city" or "state_or_province" attributes). With this information, Nomenclator can isolate queries to a small percentage of the CCSO data repositories, and provide an integrated view of their data. Nomenclator provides tools that minimize the effort that administrators expend in characterizing their data repositories. Nomenclator does not require administrators to change the format of their data or the access protocol for their database.

管理员通过提供有关其数据存储库的位置、格式、内容和协议的信息,使其数据在Nomenclator中可用。使用命名器的经验表明,从数据所有者那里收集少量信息可以对用户检索信息的能力产生重大的积极影响。例如,每个CCSO管理员都提供了从CCSO服务器上的本地数据视图(即本地模式)到Nomenclator的世界视图的映射。管理员还为数据存储库中具有小型域的任何属性(例如“城市”或“州”或“省”属性)提供可能的值。有了这些信息,Nomenclator可以将查询隔离到一小部分CCSO数据存储库,并提供其数据的集成视图。Nomenclator提供的工具可以最大限度地减少管理员在描述其数据存储库特性时所花费的精力。Nomenclator不要求管理员更改其数据的格式或数据库的访问协议。

2.1 Components of a Nomenclator System
2.1 命名系统的组成部分

A Nomenclator system is comprised of a distributed catalog service and a query resolver (see Figure 2). The distributed catalog service gathers meta-data about data repositories and makes it available to the query resolver. Meta-data includes constraints on attribute

命名器系统由分布式目录服务和查询解析器组成(见图2)。分布式目录服务收集有关数据存储库的元数据,并使其可供查询解析器使用。元数据包含对属性的约束

values at a data repository, known patterns of data distribution across several data repositories, search and navigation techniques, schema and protocol translation techniques, and the differing schema at data repositories.

数据存储库中的值、跨多个数据存储库的已知数据分布模式、搜索和导航技术、模式和协议转换技术,以及数据存储库中的不同模式。

   --------------------------------------------------------------------
        
   --------------------------------------------------------------------
        
                     +-------------+             +-------------+
                     |             |             |             |
         World View  |             |  Meta Data  |             |
         Query       |             |  Request    | Distributed |
         ----------->|   Query     | ----------->|             |
                     |   Resolver  |             |  Catalog    |
                     |             |             |             |
         <-----------|   (caches)  | <-----------|  Service    |
          World View |             |  Meta Data  |             |
          Response   |             |  Response   |             |
                     +-------------+             +-------------+
        
                     +-------------+             +-------------+
                     |             |             |             |
         World View  |             |  Meta Data  |             |
         Query       |             |  Request    | Distributed |
         ----------->|   Query     | ----------->|             |
                     |   Resolver  |             |  Catalog    |
                     |             |             |             |
         <-----------|   (caches)  | <-----------|  Service    |
          World View |             |  Meta Data  |             |
          Response   |             |  Response   |             |
                     +-------------+             +-------------+
        

Figure 2: Components of a Nomenclator System

图2:命名系统的组件

   --------------------------------------------------------------------
        
   --------------------------------------------------------------------
        

Query resolvers at the user sites retrieve, use, cache, and re-use this meta-data in answering user queries. The catalog is "active" in two ways. First, some meta-data moves from the distributed catalog service to each query resolver during query processing. Second, the query resolver uses the initial meta-data, in particular the search and navigation techniques, to generate additional meta-data that guides query processing. Typically, one resolver process serves a few hundred users in an organization, so users can benefit from larger resolver caches.

用户站点上的查询解析程序在回答用户查询时检索、使用、缓存和重用这些元数据。目录有两种“活动”方式。首先,在查询处理期间,一些元数据从分布式目录服务移动到每个查询解析器。其次,查询解析器使用初始元数据,特别是搜索和导航技术,生成指导查询处理的附加元数据。通常,一个解析器进程为组织中的几百个用户提供服务,因此用户可以从更大的解析器缓存中获益。

Query resolvers cache techniques for constraining the search space and the results of previously constrained searches (meta-data), and past query answers (data) to speed future query processing. Meta-data and data caching tailor the query resolver to the specific needs of the users at the query site. They also increase the scale of a Nomenclator system by reducing the load from repeated searches or queries on the distributed catalog service, data repositories, and communications network.

查询解析器缓存技术,用于约束搜索空间和先前约束的搜索结果(元数据)以及过去的查询答案(数据),以加快未来的查询处理。元数据和数据缓存根据查询站点用户的特定需求定制查询解析器。它们还通过减少分布式目录服务、数据存储库和通信网络上重复搜索或查询的负载来增加命名系统的规模。

The distributed catalog service is logically one network service, but it can be divided into pieces that are distributed and/or replicated. Query resolvers access this distributed, replicated service using the same techniques that work for multiple data repositories.

分布式目录服务在逻辑上是一个网络服务,但它可以分为分布式和/或复制的部分。查询解析器使用适用于多个数据存储库的相同技术访问此分布式复制服务。

A Nomenclator system naturally includes many query resolvers. Resolvers are independent, but renewable, query agents that can be as powerful as the resources available at the user site. Caching decreases the dependence of the resolver on the distributed catalog service for frequently used meta-data, and on data repositories for frequently used data. Caching thus improves the number of users that can be supported and the local availability of the query service.

命名系统自然包括许多查询解析器。解析程序是独立但可更新的查询代理,其功能与用户站点上可用的资源一样强大。缓存减少了解析器对分布式目录服务(用于频繁使用的元数据)和数据存储库(用于频繁使用的数据)的依赖性。因此,缓存提高了可支持的用户数量和查询服务的本地可用性。

2.2 Meta-Data Techniques
2.2 元数据技术

The active catalog structures the information space into a collection of relations about people, hosts, organizations, services and other objects. It collects meta-data for each relation and structures it into "access functions" for locating and retrieving data. Access functions respond to the question: "Where is data to answer this query?" There are two types of responses corresponding to the two types of access functions. The first type of response is: "Look over there." "Catalog functions" return this response; they constrain the query search by limiting the data repositories contacted to those having data relevant to the query. Catalog functions return a referral to data access functions that will answer the query or to additional catalog functions to contact for more detailed information. The second response to "Where?" is: "Here it is!" "Data access functions" return this response; they understand how to obtain query answers from specific data repositories. They return tuples that answer the query. Nomenclator supplies access functions for common name services, such as the CCSO service, and organizations can write and supply access functions for data in their repositories.

活动目录将信息空间构造为一组关于人员、主机、组织、服务和其他对象的关系。它收集每个关系的元数据,并将其构造为“访问函数”,用于定位和检索数据。访问函数回答以下问题:“数据在哪里回答此查询?”有两种类型的响应对应于这两种类型的访问函数。第一种类型的响应是:“看那边。”“Catalog functions”返回此响应;它们通过将所联系的数据存储库限制为具有与查询相关的数据的存储库来限制查询搜索。目录函数返回对将回答查询的数据访问函数或其他目录函数的引用,以获取更多详细信息。对“Where?”的第二个响应是:“在这里!”“数据访问函数”返回此响应;他们了解如何从特定的数据存储库获取查询答案。它们返回回答查询的元组。Nomenclator为公共名称服务(如CCSO服务)提供访问功能,组织可以为其存储库中的数据编写和提供访问功能。

Access functions are implemented as remote or local services. Remote access functions are services that are available through a standard remote procedure call interface. Local access functions are functions that are supplied with the query resolver. Local access functions can be applied to a variety of indexing and data retrieval tasks by loading them with meta-data stored in distributed catalog service. Remote access functions are preferred over local ones when the resources of the query resolver are inadequate to support the access function. The owners of data may also choose to supply remote access functions for privacy reasons if their access functions use proprietary information or algorithms. Local functions are preferred whenever possible, because they are highly replicated in resolver caches. They can reduce system and network load by bringing the resources of the active catalog directly to the users.

访问功能实现为远程或本地服务。远程访问功能是通过标准远程过程调用接口提供的服务。本地访问函数是随查询解析器提供的函数。通过使用存储在分布式目录服务中的元数据加载本地访问功能,可以将本地访问功能应用于各种索引和数据检索任务。当查询解析器的资源不足以支持访问功能时,远程访问功能优于本地访问功能。如果数据所有者的访问功能使用专有信息或算法,出于隐私原因,他们也可以选择提供远程访问功能。尽可能首选本地函数,因为它们在解析器缓存中高度复制。通过将活动目录的资源直接提供给用户,它们可以减少系统和网络负载。

Remote access functions are simple to add to Nomenclator and local access functions are simple to apply to new data repositories, because the active catalog provides "referrals" that describe the conditions for using access functions. For simplicity, this document describes referral techniques for exact matching of query strings. Extensions to these techniques in Nomenclator support matching query strings that contain wildcards or word-based matching of query strings in the style of the CCSO services.

远程访问函数很容易添加到命名器,而本地访问函数很容易应用到新的数据存储库,因为活动目录提供了描述访问函数使用条件的“参考”。为了简单起见,本文描述了查询字符串精确匹配的引用技术。Nomenclator中对这些技术的扩展支持匹配包含通配符的查询字符串,或者以CCSO服务的样式对查询字符串进行基于单词的匹配。

Each referral contains a template and a list of references to access functions. The template is a conjunctive selection predicate that describes the scope of the access functions. Conjunctive queries that are within the scope of the template can be answered with the referral. When a template contains a wildcard value ("*") for an attribute, the attribute must be present in any queries that are processed by the referral. The system follows the following rule:

每个引用都包含一个模板和访问函数的引用列表。模板是一个连接选择谓词,用于描述访问函数的范围。模板范围内的连词查询可以通过引用来回答。当模板包含属性的通配符值(“*”)时,该属性必须出现在引用处理的任何查询中。系统遵循以下规则:

Query Coverage Rule:

查询覆盖率规则:

If the set of tuples satisfying the selection predicate in a query is covered by (is a subset of) the set of tuples satisfying the template, then the query can be answered by the access functions in the reference list of the referral.

如果查询中满足选择谓词的元组集被满足模板的元组集覆盖(是其子集),则查询可以由引用的引用列表中的访问函数来回答。

For example, the query below:

例如,下面的查询:

     select * from People where country = "US" and surname = "Ordille";
        
     select * from People where country = "US" and surname = "Ordille";
        

is covered by the following templates in Lines (1) through (3), but not by the templates in Lines (4) and (5):

第(1)行至第(3)行中的以下模板涵盖,但第(4)行和第(5)行中的模板不涵盖:

      (1) country = "US" and surname = "*"
        
      (1) country = "US" and surname = "*"
        

(2) country = "US" and surname = "Ordille"

(2) country=“US”和姓氏=“Ordille”

(3) country = "US"

(3) country=“美国”

(4) organization = "*"

(4) 组织机构=“*”

(5) country = "US" and surname = "Elliott"

(5) country=“US”和姓氏=“Elliott”

Referrals form a generalization/specialization graph for a relation called a "referral graph." Referral graphs are a conceptual tool that guides the integration of different catalog functions into our system and that supplies a basis for catalog function construction and query processing. A "referral graph" is a partial ordering of

引用形成了一个称为“引用图”的关系的泛化/专门化图。引用图是一个概念工具,用于指导将不同的目录功能集成到我们的系统中,并为目录功能构建和查询处理提供基础。“参考图”是对

the referrals for a relation. It is constructed using the subset/superset relationship: "S is a subset of G." A referral S is a subset of referral G if the set of queries covered by the template of S is a subset of the set of queries covered by the template of G. S is considered a more specific referral than G; G is considered a more general referral than S. For example, the subset relationship exists between the pairs of referrals with the templates listed below:

关系的转介。它使用子集/超集关系构建:“S是G的子集。”如果S模板涵盖的查询集是G模板涵盖的查询集的子集,则参考S是参考G的子集。S模板涵盖的查询集被视为比G更具体的参考;G被认为是比S更一般的推荐。例如,具有以下模板的推荐对之间存在子集关系:

(1) country = "US" and surname = "Ordille" is a subset of country = "US"

(1) country=“US”和姓氏=“Ordille”是country=“US”的子集

      (2) country = "US" and surname = "Ordille"
          is a subset of
          country = "US" and surname = "*"
        
      (2) country = "US" and surname = "Ordille"
          is a subset of
          country = "US" and surname = "*"
        

(3) country = "US" and surname = "*" is a subset of country ="US"

(3) country=“US”和姓氏=“*”是country=“US”的子集

(4) country = "US" is a subset "empty template"

(4) country=“US”是“空模板”的子集

but it does not exist between the pairs of referrals with the following templates:

但它不存在于具有以下模板的两对转介之间:

(5) country = "US" is not a subset of department = "CS"

(5) country=“US”不是department=“CS”的子集

(6) country = "US" and name = "Ordille" is not a subset of country = "US" and name = "Elliott"

(6) country=“US”和name=“Ordille”不是country=“US”和name=“Elliott”的子集

In Lines (1) and (2), the more general referral covers more queries, because it covers queries that list different values for surname. In Line (3), the more general referral covers more queries, because it covers queries that do not constrain surname to a value. In Line (4), the specific referral covers only those queries that constrain the country to "US" while the empty template covers all queries.

在第(1)行和第(2)行中,更一般的引用包含更多查询,因为它包含列出不同姓氏值的查询。在第(3)行中,更一般的引用涵盖了更多的查询,因为它涵盖了不将姓氏限制为值的查询。在第(4)行中,特定引用仅涵盖那些将国家限制为“我们”的查询,而空模板涵盖所有查询。

During query processing, wildcards in a template are replaced with the value of the corresponding attribute in the query. For any query covered by two referrals S and G such that S is a subset of G, the set of tuples satisfying the template in S is covered by the set of

在查询处理过程中,模板中的通配符将替换为查询中相应属性的值。对于由两个引用S和G覆盖的任何查询,使得S是G的子集,满足S中模板的元组集由

tuples satisfying the template in G. S is used to process the query, because it provides the more constrained (and faster) search space. The referral S has a more constrained logical search space than G, because the set of tuples in the scope of S is no larger, and often smaller, than the set in the scope of G. Moreover, S has a more constrained physical search space than G, because the data repositories that must contacted for answers to S must also be contacted for answers to G, but additional data repositories may need to be contacted to answer G.

满足G.S中模板的元组用于处理查询,因为它提供了更受约束(更快)的搜索空间。参考S的逻辑搜索空间比G更受约束,因为S范围内的元组集不比G范围内的元组集大,而且通常比G范围内的元组集小。此外,S的物理搜索空间比G更受约束,因为必须联系S答案的数据存储库也必须联系G答案,但可能需要联系其他数据存储库来回答G。

In constraining a query, a catalog function always produces a referral that is more specific than the referral containing the catalog function. Wildcards ("*") in a template indicate which attribute values are used by the associated catalog function to generate a more specific referral. In other words, catalog functions always follow the rule:

在约束查询时,目录函数始终生成比包含目录函数的引用更具体的引用。模板中的通配符(“*”)指示关联目录函数使用哪些属性值来生成更具体的引用。换句话说,目录函数始终遵循以下规则:

Catalog Function Constrained Search Rule:

目录函数约束的搜索规则:

Given a referral R with a template t and a catalog function cf, and a query q covered by t, the result of using cf to process q, cf(q), is a referral R' with template t' such that q is covered by t' and R' is more specific than R.

给定带有模板t和目录函数cf的引用R,以及由t覆盖的查询q,使用cf处理q,cf(q)的结果是带有模板t的引用R,使得q由t覆盖,并且R比R更具体。

Catalog functions make it possible to import a portion of the indices for the information space into the query resolver. Since they generate referrals, the resolver can cache the most useful referrals for a relation and call the catalog function as needed to generate new referrals.

目录函数可以将信息空间的部分索引导入查询解析器。由于它们生成引用,解析器可以缓存关系中最有用的引用,并根据需要调用catalog函数来生成新引用。

The resolver query processing algorithm obtains an initial set of referrals from the distributed catalog service. It then navigates the referral graph, calling catalog functions as necessary to obtain additional referrals that narrow the search space. Sometimes, two referrals that cover the query have the relationship of general to specific to each other. The resolver eliminates unnecessary access function processing by using only the most specific referral along each path of the referral graph.

解析器查询处理算法从分布式目录服务获取一组初始引用。然后,它导航引用图,根据需要调用catalog函数以获得其他引用,从而缩小搜索空间。有时,涉及查询的两个引用彼此之间具有一般到特定的关系。解析器通过沿引用图的每条路径仅使用最特定的引用,消除了不必要的访问函数处理。

The search space for the query is initially set to all the data repositories in the relation. As the resolver obtains referrals to sets of relevant data repositories (and their associated data access functions) it forms the intersection of the referrals to constrain the search space further. The intersection of the referrals includes only those data repositories listed in all the referrals. Intersection combines independent paths through the referral graph to derive benefit from indices on different attributes.

查询的搜索空间最初设置为关系中的所有数据存储库。当解析器获得对相关数据存储库集(及其相关数据访问函数)的引用时,它形成了引用的交叉点,以进一步限制搜索空间。引用的交集仅包括所有引用中列出的那些数据存储库。交叉点通过参考图组合独立路径,从不同属性的索引中获益。

2.3 Meta-Data and Data Caching
2.3 元数据和数据缓存

A Nomenclator query resolver caches the meta-data that result from calling catalog functions. It also caches the responses for queries. If the predicate of a new query is covered by the predicate of a previous query, Nomenclator calculates the response for the new query from the cached response of the old query. Nomenclator timestamps its cache entries to provide measures of the currentness of query responses and selective cache refresh. The timestamps are used to calculate a t-bound on query responses [5][1]. A t-bound is the time after which changes may have occurred to the data that are not reflected in the query response. It is the time of the oldest cache entry used to calculate the response. Nomenclator returns a t-bound with each query response. Users can request more current data by asking for responses that are more recent than this t-bound. Making such a request flushes older items from the cache if more recent items are available. Query resolvers calculate a minimum t-bound that is some refresh interval earlier than the current time. Resolvers keep themselves current by replacing items in the cache that are earlier than the minimum t-bound.

命名查询解析器缓存调用目录函数产生的元数据。它还缓存查询的响应。如果新查询的谓词被前一个查询的谓词覆盖,Nomenclator将根据旧查询的缓存响应计算新查询的响应。Nomenclator为其缓存项添加时间戳,以提供查询响应的当前性和选择性缓存刷新的度量。时间戳用于计算查询响应的t界[5][1]。t界限是指在查询响应中未反映的数据发生更改之后的时间。它是用于计算响应的最早缓存项的时间。Nomenclator返回每个查询响应的t边界。用户可以通过请求比此t边界更近的响应来请求更当前的数据。如果有较新的项目可用,则发出此类请求将从缓存中刷新较旧的项目。查询解析程序计算比当前时间早一些刷新间隔的最小t界限。解析程序通过替换缓存中早于最小t界限的项来保持自身的最新状态。

2.4 Scale and Performance
2.4 规模和业绩

Three performance studies of active catalog and meta-data caching techniques are available [5]. The first study shows that the active catalog and meta-data caching can constrain the search effectively in a real environment, the X.500 name space. The second study examined the performance of an active catalog and meta-data caching for single users on a local area network. The experiments showed that the techniques to eliminate data repositories from the search space can dramatically improve response time. Response times improve, because latency is reduced. The reduction of latency in communications and processing is critical to large-scale descriptive query optimization. The experiments also showed that an active catalog is the most significant contributor to better response time in a system with low load, and that meta-data caching functions to reduce the load on the system. The third study used an analytical model to evaluate the performance and scaling of these techniques for a large Internet environment. It showed that meta-data caching plays an essential role in scaling the distributed catalog service to millions of users. It also showed that constraining the search space with an active catalog contributes significantly to scaling data repositories to millions of users. Replication and data caching also contribute to the scale of the system in a large Internet environment.

活动目录和元数据缓存技术的三项性能研究可用[5]。第一项研究表明,活动目录和元数据缓存可以在实际环境X.500名称空间中有效地约束搜索。第二项研究考察了局域网上单个用户的活动目录和元数据缓存的性能。实验表明,从搜索空间中删除数据存储库的技术可以显著提高响应时间。响应时间缩短,因为延迟减少。减少通信和处理延迟对于大规模描述性查询优化至关重要。实验还表明,在低负载的系统中,活动目录是提高响应时间的最重要因素,元数据缓存可以降低系统负载。第三项研究使用分析模型来评估这些技术在大型互联网环境中的性能和可扩展性。它表明元数据缓存在将分布式目录服务扩展到数百万用户方面起着至关重要的作用。它还表明,使用活动目录限制搜索空间有助于将数据存储库扩展到数百万用户。在大型Internet环境中,复制和数据缓存也有助于提高系统的规模。

3. Registering a CCSO Server
3. 注册CCSO服务器

The Internet Nomenclator Project supports the following home page:

Internet Nomenclator项目支持以下主页:

      http://cm.bell-labs.com/cs/what/nomenclator
        
      http://cm.bell-labs.com/cs/what/nomenclator
        

The home page provides a variety of information and services.

主页提供了各种信息和服务。

Administrators can register their CCSO servers through services on this home page. The registration service collects CCSO server location information, contact information for the administrator of the CCSO server, implicit and explicit constraints on entries in the server's database, and a mapping from the local schema of the CCSO server to the schema of the world view.

管理员可以通过此主页上的服务注册其CCSO服务器。注册服务收集CCSO服务器位置信息、CCSO服务器管理员的联系信息、服务器数据库中条目的隐式和显式约束,以及从CCSO服务器的本地架构到世界视图架构的映射。

The implicit and explicit constraints on the server's database are the fuel for Nomenclator's catalog functions. The registration center currently collects constraints on organization name, department, city, state or province name, country, phone number, postal code, and email address. These constraints are automatically incorporated into Nomenclator's distributed catalog service. They are used by catalog functions in query resolvers to constrain searches to relevant CCSO servers. For example, a database only contains information about the computer science and electrical engineering departments at a French university. The department, organization and country attributes are constrained. Nomenclator uses these constraints to prevent queries about other departments, organizations or countries from being sent to this CCSO server.

服务器数据库上的隐式和显式约束是Nomenclator目录函数的燃料。注册中心目前收集对组织名称、部门、城市、州或省名称、国家、电话号码、邮政编码和电子邮件地址的限制。这些约束将自动合并到Nomenclator的分布式目录服务中。查询解析器中的目录函数使用它们将搜索约束到相关CCSO服务器。例如,数据库只包含法国大学计算机科学和电气工程系的信息。部门、组织和国家属性受到约束。Nomenclator使用这些约束来防止将有关其他部门、组织或国家的查询发送到此CCSO服务器。

The mapping from the local schema of the CCSO server to the schema of the world view allows Nomenclator to translate queries and responses for the CCSO server. The registration center currently collects this mapping by requesting an example of how to translate a typical entry in the CCSO server into the world view schema and, optionally, an example of how to translate a canonical entry in the world view schema into the local schema of the CCSO server [4]. These examples are then used to generate a mapping program that is stored in the distributed catalog service. The CCSO data access function in the query resolver interprets these programs to translate queries and responses communicated with that CCSO server. We plan to release the mapping language to CCSO server administrators, so administrators can write and maintain the mapping for their servers. We have experimented with more than 20 mapping programs. They are seldom more than 50 lines, and are often shorter. It typically takes one or two lines to map an attribute.

从CCSO服务器的本地模式到世界视图模式的映射允许Nomenclator翻译CCSO服务器的查询和响应。注册中心目前通过请求如何将CCSO服务器中的典型条目转换为世界视图模式的示例以及(可选)如何将世界视图模式中的规范条目转换为CCSO服务器的本地模式的示例来收集此映射[4]。然后使用这些示例生成存储在分布式目录服务中的映射程序。查询解析器中的CCSO数据访问功能解释这些程序,以转换与该CCSO服务器通信的查询和响应。我们计划向CCSO服务器管理员发布映射语言,以便管理员可以编写和维护其服务器的映射。我们已经试验了20多个绘图程序。它们很少超过50行,而且通常较短。映射属性通常需要一到两行。

4. Using Nomenclator
4. 使用命名法

The Internet Nomenclator Project currently provides a centralized query service on the Internet. The project runs a Nomenclator query resolver that is accessible through its Web page (see the URL in Section 3) and the Simple Nomenclator Query Protocol (SNQP) [2].

因特网命名项目目前在因特网上提供集中查询服务。该项目运行一个可通过其网页(参见第3节中的URL)和简单命名查询协议(SNQP)[2]访问的命名查询解析器。

The service answers queries that are a conjunction of string values for attributes. A variety of matching techniques are supported including exact string matching, matching with wildcards, and word-based matching in the style of the CCSO service. Our web interface uses the Simple Nomenclator Query Protocol (SNQP) [2]. Programmers can create their own interfaces by using this protocol to communicate with the Nomenclator query resolver. They will require the host name and port number for the query resolver which they can obtain from the Nomenclator home page. SNQP, and hence the web interface, are defined for US-ASCII. Support for other character sets will require further work.

该服务回答属性的字符串值的联合查询。支持多种匹配技术,包括精确字符串匹配、通配符匹配和CCSO服务风格的基于单词的匹配。我们的web界面使用简单命名查询协议(SNQP)[2]。程序员可以通过使用此协议与命名查询解析器通信来创建自己的接口。他们需要查询解析器的主机名和端口号,可以从Nomenclator主页上获得。SNQP和web界面是为US-ASCII定义的。对其他字符集的支持将需要进一步的工作。

Subsequent phases of the project will provide enhanced services such as providing advice about the cost of queries and ways to constrain queries further to produce faster response times, and allowing users to request more current data. We also plan to distribute query resolvers, so users can benefit from running query resolvers locally. Local query resolvers reduce latency for the user, and distribute query processing load throughout the network.

该项目的后续阶段将提供增强服务,例如提供有关查询成本的建议,以及进一步限制查询以产生更快响应时间的方法,并允许用户请求更多最新数据。我们还计划分发查询解析器,这样用户就可以从本地运行查询解析器中获益。本地查询解析器减少了用户的延迟,并在整个网络中分配查询处理负载。

5. Summary
5. 总结

The Internet Nomenclator Project augments existing CCSO services by supplying schema integration and fast cross-server searches. The key to speed in descriptive query processing is an active catalog, and extensive meta-data and data caching. The Nomenclator system is the result of research in distributed systems [5][6][7][4]. It can be extended to incorporate other name servers, besides the CCSO servers, and to address distributed search and retrieval challenges in other application domains. In addition to providing a white pages service, the Internet Nomenclator Project will evaluate how an active catalog, meta-data caching and data caching perform in very large global information system. The ultimate goal of the project is to refine these techniques to provide the best possible global information systems.

Internet Nomenclator项目通过提供模式集成和快速跨服务器搜索来扩充现有CCSO服务。加快描述性查询处理速度的关键是一个活动目录,以及广泛的元数据和数据缓存。命名系统是分布式系统研究的结果[5][6][7][4]。它可以扩展到除CCSO服务器之外的其他名称服务器,并解决其他应用程序域中的分布式搜索和检索难题。除了提供白页服务外,互联网命名项目还将评估活动目录、元数据缓存和数据缓存在大型全球信息系统中的表现。该项目的最终目标是改进这些技术,以提供尽可能最好的全球信息系统。

6. Security Considerations
6. 安全考虑

In the Internet Nomenclator Project, the participants' data are openly available and read-only. Since the risk of tampering with queries and responses is considered low, this version of Nomenclator does not define procedures for protecting the information in its queries and responses.

在互联网命名项目中,参与者的数据是公开的和只读的。由于篡改查询和响应的风险较低,此版本的命名器未定义保护查询和响应中信息的过程。

7. References
7. 工具书类

[1] H. Garcia-Molina, G. Wiederhold. "Read-Only Transactions in a Distributed Database," ACM Transactions on Database Systems 7(2), pp. 209-234. June 1982.

[1] H.加西亚·莫利纳,G.维德霍尔德。“分布式数据库中的只读事务”,数据库系统上的ACM事务7(2),第209-234页。1982年6月。

[2] Elliott, J., and J. Ordille, "The Simple Nomenclator Query Protocol (SNQP)," RFC 2259, January 1998.

[2] Elliott,J.和J.Ordille,“简单命名查询协议(SNQP)”,RFC2259,1998年1月。

   [3]   S. Dorner, P. Pomes. "The CCSO Nameserver: A Description,"
         Computer and Communications Services Office Technical Report,
         University of Illinois, Urbana, USA. 1992. Avaialble in the
         current "qi" distribution from
         <URL:ftp://uiarchive.cso.uiuc.edu/local/packages/ph>
        
   [3]   S. Dorner, P. Pomes. "The CCSO Nameserver: A Description,"
         Computer and Communications Services Office Technical Report,
         University of Illinois, Urbana, USA. 1992. Avaialble in the
         current "qi" distribution from
         <URL:ftp://uiarchive.cso.uiuc.edu/local/packages/ph>
        
   [4]   A. Levy, J. Ordille. "An Experiment in Integrating Internet
         Information Sources," AAAI Fall Symposium on AI Applications in
         Knowledge Navigation and Retrieval, November 1995.
         <URL:http://cm.bell-labs.com/cm/cs/doc/95/11-01.ps.gz>
        
   [4]   A. Levy, J. Ordille. "An Experiment in Integrating Internet
         Information Sources," AAAI Fall Symposium on AI Applications in
         Knowledge Navigation and Retrieval, November 1995.
         <URL:http://cm.bell-labs.com/cm/cs/doc/95/11-01.ps.gz>
        
   [5]   J. Ordille. "Descriptive Name Services for Large Internets,"
         Ph. D. Dissertation. University of Wisconsin. 1993.
         <URL:http://cm.bell-labs.com/cm/cs/doc/93/12-01.ps.gz>
        
   [5]   J. Ordille. "Descriptive Name Services for Large Internets,"
         Ph. D. Dissertation. University of Wisconsin. 1993.
         <URL:http://cm.bell-labs.com/cm/cs/doc/93/12-01.ps.gz>
        
   [6]   J. Ordille, B. Miller. "Distributed Active Catalogs and
         Meta-Data Caching in Descriptive Name Services," Thirteenth
         International IEEE Conference on Distributed Computing Systems,
         pp. 120-129.  May 1993.
         <URL:http://cm.bell-labs.com/cm/cs/doc/93/5-01.ps.gz>
        
   [6]   J. Ordille, B. Miller. "Distributed Active Catalogs and
         Meta-Data Caching in Descriptive Name Services," Thirteenth
         International IEEE Conference on Distributed Computing Systems,
         pp. 120-129.  May 1993.
         <URL:http://cm.bell-labs.com/cm/cs/doc/93/5-01.ps.gz>
        
   [7]   J. Ordille, B. Miller. "Nomenclator Descriptive Query
         Optimization in Large X.500 Environments," ACM SIGCOMM
         Symposium on Communications Architectures and Protocols, pp.
         185-196, September 1991.
         <URL:http://cm.bell-labs.com/cm/cs/doc/91/9-01.ps.gz>
        
   [7]   J. Ordille, B. Miller. "Nomenclator Descriptive Query
         Optimization in Large X.500 Environments," ACM SIGCOMM
         Symposium on Communications Architectures and Protocols, pp.
         185-196, September 1991.
         <URL:http://cm.bell-labs.com/cm/cs/doc/91/9-01.ps.gz>
        
8. Author's Address
8. 作者地址

Joann J. Ordille Bell Labs, Lucent Technologies Computing Sciences Research Center 700 Mountain Avenue, Rm 2C-301 Murray Hill, NJ 07974 USA

Joann J.Ordille Bell实验室,朗讯科技计算科学研究中心,美国新泽西州默里山山路700号2C-301室,邮编:07974

   EMail: joann@bell-labs.com
        
   EMail: joann@bell-labs.com
        
9. Full Copyright Statement
9. 完整版权声明

Copyright (C) The Internet Society (1998). All Rights Reserved.

版权所有(C)互联网协会(1998年)。版权所有。

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.

本文件及其译本可复制并提供给他人,对其进行评论或解释或协助其实施的衍生作品可全部或部分编制、复制、出版和分发,不受任何限制,前提是上述版权声明和本段包含在所有此类副本和衍生作品中。但是,不得以任何方式修改本文件本身,例如删除版权通知或对互联网协会或其他互联网组织的引用,除非出于制定互联网标准的需要,在这种情况下,必须遵循互联网标准过程中定义的版权程序,或根据需要将其翻译成英语以外的其他语言。

The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.

上述授予的有限许可是永久性的,互联网协会或其继承人或受让人不会撤销。

This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

本文件和其中包含的信息是按“原样”提供的,互联网协会和互联网工程任务组否认所有明示或暗示的保证,包括但不限于任何保证,即使用本文中的信息不会侵犯任何权利,或对适销性或特定用途适用性的任何默示保证。