1. The architecture of H.323 To enable multimedia conferences on packet-switched networks that do not guarantee QoS, the first version of H.323 was adopted by ITU's Study Group 15 SG-15 in 1996, and the second H.323 was proposed in 1998. Version. H.323 has developed a multimedia communication system standard on PBN (packet Based Networks) without QoS (quality of service) guarantee. These packet networks dominate today's desktop network systems, including Ethernet based on TCP / IP and IPX packet switching Network, fast Ethernet, token network, FDDI technology. Therefore, the H.323 standard provides a technical foundation and guarantee for multimedia communication applications on LAN, WAN, Internet, and the Internet. H.323 is part of the ITU multimedia communication series standard H.32x, which makes it possible to conduct video conferences on existing communication networks. Among them, H.320 is a standard for multimedia communication on N-ISDN: H.321 is the standard for multimedia communication on B-ISDN: H.322 is the standard for multimedia communication on the LAN with guaranteed quality of service: H.324 is the standard for multimedia communication on GSTN and wireless networks. H.323 provides multimedia communication standards for existing packet networks PBN (such as IP networks). If it is combined with other IP technologies such as RSVP, the resource reservation protocol of IETF, multimedia communication in IP networks can be achieved. IP-based LANs are becoming more and more powerful, such as IP over SDH / SONET, IP over ATM technologies are rapidly developing and LAN broadband is constantly improving. Since it can provide interoperability between equipment and equipment, applications and applications, and suppliers and suppliers, H.323 can ensure the interoperability of all H.323 compatible devices. Higher speed processors, increasingly enhanced graphics devices and powerful multimedia acceleration chips make PC a more and more powerful multimedia platform. H.323 can provide the interconnection and intercommunication standard for multimedia communication between PBN and other networks. Many computers and network communication companies, such as Intel, Microsoft and Netscape, support the H.323 standard. The H.323 standard includes the technical requirements for multimedia communication in packet networks without QoS guarantee. These packet networks include LAN, WAN, Internet / Internet and dial-up connections or point-to-point connections through GSTN or ISDN using packet protocols such as PPP. On the whole, H.323 is a framework construction, which involves the content of terminal equipment, video, audio and data transmission, communication control, network interface, and also includes the multipoint control unit (MCU) that constitutes a multipoint conference ), Multi-point controller (MC), multi-point processor (MP), gateway, gatekeeper and other equipment. Its basic unit is the "domain". In the H.323 system, the so-called domain refers to a gateway, multipoint control unit (MCU), multipoint controller (MC), and multipoint processor managed by the gatekeeper ( MP) and all terminals. A domain contains at least one terminal, and there must be only one gatekeeper. The various logical components in the H.323 system are called H.323 entities, and their types are: terminal, gateway, multipoint control unit (MCU), multipoint controller (MC), and multipoint processor (MP). Among them, the terminal, gateway, and multipoint control unit (MCU) are terminal devices in H.323, and are logical units in the network. Terminal equipment is callable and called, while some entities cannot be called, such as gatekeepers. H.323 includes end-to-end connections between H.323 terminals and other terminals through different networks. Second, the composition of H.323 terminals H.323 defines four main components for network-based communication systems: Terminal, Gateway, Gatekeeper, and Multipoint Control Unit (MCU). A terminal is a node device that can provide real-time, two-way communication in a packet network. It is also a terminal user device that can communicate with a gateway and a multipoint access control unit. All terminals must support voice communication, video and data communication are optional. H.323 stipulates the operation modes required for different audio, video or data terminals to work together. It will be the main standard for next-generation Internet telephony, audio conferencing terminals, and video conferencing technologies. Figure 6-2 shows the block diagram of the H.323 terminal. At the sending end, the video and audio signals obtained from the input device are compressed by the encoder, packaged according to a certain format, and sent out through the network. At the receiving end, from the network The data packets are first unpacked, and the obtained video and audio compressed data are decoded and sent to the output device. User data and control data are also processed accordingly. Each functional unit and its standard equipment or protocol included in it are: Video codec (H.263 / H.261): Complete redundant compression encoding of video stream. Audio codec (H.723.1, etc.): complete the coding and decoding of the voice signal, and optionally add a buffer delay at the receiving end to ensure the continuity of the voice. The adopted standard is ITU-T H.723.1, which provides two code rates of 5.3kbit / s and 6.3kbit / s, adopts linear prediction and comprehensive analysis coding method, and uses algebraic codebook to stimulate linear prediction and multi-pulse maximum likelihood Quantization, so as to obtain the optimization of coding complexity and quality. Various data applications: including electronic whiteboards, still image transmission, file exchange, database coexistence, data conferences, transportation equipment control, etc. Available standards are T.120, T.84, T.434, etc. Control unit (H.245): Provides end-to-end signaling to ensure normal communication of H.323 terminals. The protocol used is H.245 (Multimedia Communication Control Protocol), which defines four types of information: request, response, signaling, and indication. It negotiates communication capabilities between various terminals, opens / closes logical channels, and sends commands or instructions. Wait for the operation to complete the control of the communication. H.225 layer: format and send video, audio, control and other data, and receive data from the network at the same time. In addition, it is also responsible for processing functions such as logical framing, adding sequence numbers, and error detection. 3. H.323 standard protocol cluster H.323 is a standard protocol stack of the International Telecommunication Union (ITU). The protocol stack is an organic whole. It can be divided into four types of protocols according to functions. That is to say, the protocol is based on the overall framework of the system (H.323 ), Video codec (H.263), audio codec (H.723.1), system control (H.245), data stream multiplexing (H.225) and other aspects made more detailed regulations. It provides good conditions for the further development of VoIP and video conference system and system compatibility. The system control protocols include H.323, H.245, and H.225.0, and Q.931 and RTP / RTCP are the main components of H.225.0. System control is the core of H.323 terminals. The entire system is controlled by the H.245 control channel, H.225.0 call signaling channel and RAS (registration, permission, status) channel. The audio codec protocols include G.711 protocol (required), G.722, G.723.1 , G.728, G.729 and other protocols. The audio standard used by the encoder must be negotiated and determined by the H.245 protocol. The H.323 terminal shall perform asymmetric operations on its own audio codec capabilities. If it is sent as G.711, it is received as G.729. Video codec protocols mainly include H.261 protocol (required) and H.263 protocol. The video function in the H.323 system is optional. The data conference function is also optional, and its standard is the multimedia conference data protocol T.120. 1. H.323 components The H.323 terminal is the most basic component defined by H.323. All H.323 terminals must also support the H.245 standard, which is used to control channel usage and channel performance. Other optional components in the H.323 terminal are image codec, T.120 data conference protocol and MCU function. The gateway is also an optional component of the H.323 conference system. The gateway provides many services, including the conversion function between the H.323 conference node equipment and other terminals compatible with other ITU standards. Such functions include transmission formats (such as H.250.0 to H.221) and communication protocol conversion (such as H.245 to H.242). In addition, between the packet network side and the circuit-switched network side, the gateway also performs voice and image codec conversion work, as well as call establishment and teardown work. The terminal uses the H.245 and H.225.0 protocols to communicate with the gateway. With an appropriate decoder, the H.323 gateway can support H.310, H.321, H.322 and V.70 standard terminals. Gatekeeper is a groupable option of H.323 system, and its function is to provide call control service to H.323 nodes. When the H.323 gatekeeper exists in the system, it must provide the following four service addresses: address translation, bandwidth control, permission control, and area management functions. Bandwidth management, call authentication, call control signaling, and call management are optional features for the gatekeeper. Although logically, the gatekeeper is separated from the H.323 node equipment, but the manufacturer can integrate the gatekeeper function into physical equipment such as H.323 terminals, gateways, and multipoint control units. The collection of all terminals, gateways, and multipoint control units managed by a single gatekeeper is called the H.323 domain. The multipoint control unit supports meetings of more than three node devices. In the H.323 system, a multipoint control unit is composed of a multipoint controller MC and several multipoint processors MP, but may not include MP. The MC processes H.245 control information between endpoints to determine its usual processing capabilities for video and audio. When necessary, MC can also control conference resources by determining which video and audio streams need to be multicast. MC does not directly process any media information stream, but leaves it to MP for processing. MP mixes, switches and processes audio, video or data information. MC and MP may exist in a dedicated device or as part of other H.323 components. The audio encoder encodes and transmits the audio information input from the microphone, and decodes it at the receiving end to output to the speaker. The audio signal contains digitized and compressed voice. The compression algorithm supported by H.323 conforms to ITU standards. In order to perform voice compression, H.323 terminals must support the G.711 voice standard, and transmit and receive A-law and U-law. Other audio codec standards such as G.722, G.723.1, G.729.A, MPEG-1 audio can optionally be supported. The audio algorithm used by the encoder must be determined by H.245. The H.323 terminal should be able to perform asymmetric operations on its own audio codec capabilities, such as sending in G.711 and receiving in G.728. The video codec encodes and transmits video information at the video source and decodes and displays it at the receiving end. Although the video function is optional, any H.323 terminal with video function must support the H.261QCIF format; other formats that support H.261 and optional support for the H.263 standard. On packet networks, the use of H.261 and H.263 codecs does not require BCH error correction and error correction frames. Data conference T.120 is an optional feature. When data conferences are supported, data conferences can work together, such as whiteboards, application sharing, file transfers, still image transfers, database access, audio image conferences, and so on. After processing through H.245, other data applications and protocols can also be used. 2. H.225, H.245 and other agreements The communication in the H.323 system can be seen as a mixture of video, audio, and control information. The system control function is the core of the H.323 terminal, and it provides signaling for the correct operation of the H.323 terminal. These functions include call control (establishment and teardown), toggle switching, command and instruction signaling, and messages used to open and describe the contents of logical channels. The control of the entire system is provided by the H.245 control channel, H.225.0 call channel and RAS channel. The H.225.0 standard describes a mechanism for packetized packet and synchronous transmission of media streams on a LAN without QoS guarantee. H.225.0 formats the transmitted control flow for output to the network interface, and retrieves the received control flow from the network interface input message. In addition, it also completes the functions of logical frame, sequence number, error correction and error detection. In the H.323 multimedia communication system, the transmission of control signaling and data flow utilizes a connection-oriented transmission mechanism. In the IP game stack, IP and TCP work together to complete connection-oriented transmission. Reliable transmission guarantees the flow control, continuity, and correctness of data packet transmission, but may also cause transmission delay and occupy network bandwidth. H.323 uses reliable TCP for H.245 control channels, T.120 data channels, and call signaling channels. The video and audio information uses an unreliable, non-connection-oriented transmission method, that is, the User Datagram Protocol (UDP). UDP cannot provide good QoS, and only provides the least control information, so the transmission delay is smaller than TCP. In a multimedia communication system with multiple video and audio streams, IP-based multicast and unreliable transmission use IP multicast and IETF real-time transmission protocol RTP to process video and audio information. IP multicast is a protocol for unreliable multicast transmission in UDP. RTP works on the top layer of IP multicast and is used to process video and audio streams on the IP network. Each UDP packet is added with a header containing a time stamp and sequence number. If the receiving end is equipped with proper buffering, it can use the timestamp and sequence number information to "restore and regenerate" data packets, record out-of-sequence packets, synchronize voice, images and data, and improve the edge playback effect. The real-time control protocol RTCP is used for RTP control. RTCP monitors the quality of service and information transmitted on the Internet, and regularly distributes control information packets containing service quality information to all communication nodes. In large packet networks such as the Internet, it is important and difficult to reserve enough broadband for a multimedia call. Another IETF protocol, the Resource Pre-Stream Protocol RSVP, allows the receiving end to apply for a certain amount of broadband for a particular data stream and get a reply to confirm whether the application is approved. Although RSVP is not an official part of the H.323 standard, most H.323 products must support him, because broadband pre-streaming is critical to the success of multimedia communications on IP networks. RSVP requires access to terminals, gateways, and equipment. Supported by MCUs with multi-point processors and intermediate routers or switches. H.225.0 is applicable to different types of networks, including Ethernet and Token Ring. H.225.0 is defined in transport layers such as TCP / IP, SPX / IPX. The scope of H.225.0 communication is between H.323 gateways, and they are on the same network and use the same transmission protocol. If the H.323 protocol is used on the entire Internet, the communication performance will be degraded. H.323 tries to extend H.320 to the local area network without quality assurance. By using powerful authorization control conference control, the participants of a special conference can be from a few people to several thousand people. H.225.0 establishes a call model. In this model, RTP transport addresses are not used for call establishment and performance negotiation. Several RTP / RTCP connections are established after the call establishment. Before the call is established, the terminal can register with a gatekeeper. If the terminal wants to register with a gatekeeper, it must know the age of the gatekeeper (Vintage). Because of this, both the discovery and registration (regisTIon) structures contain an H.245 type object flag, which provides the age of the H.323 application version. These structures also contain optional non-standard messages, which allow the terminal to establish non-standard relationships. At the end of these structures, the non-standard status of the version number is also included. Among them: version number is required, non-standard information is optional. Non-standard information is used to notify the two terminals of their years and non-standard status. Although all Q.931 messages have optional non-standard information in user-to-user information, they still have optional non-standard information in all RAS channel information. In addition, a non-standard RAS message can be sent at any time. Unreliable channels for registration, approval, and status communication are called RAS channels. To start a call, you must first send an acknowledgement request message, followed by an initial setup message. This process ends with the receipt of the connection message. When a reliable H.245 control channel is established, audio, video, and data transmission channels can be established accordingly. The relevant settings of the multimedia conference can also be set here. When a reliable H.245 control channel is used to transmit messages, H.225 terminals can send audio and video data over an unreliable channel. Error concealment and other information are used to deal with packet loss. Under normal circumstances, audio and video data packets will not be retransmitted, because retransmissions will cause delays on the network. It is assumed that the bottom layer has already processed the detection of bit errors, and the wrong packets will not be passed to H.225. Audio, video data and call signals are not transmitted in the same channel, and the same message structure is not used. H.225.0 has the ability to use different transmission addresses to send and receive audio and video data in different RTP instances to ensure the sequence number of different media frames and the quality of service of each media. Now ITU is studying how to mix audio and video data packets in the same frame in the same transmission address. Although audio and video data can share the same network address with the wrong transport layer service access point identification, manufacturers still choose to use different To transmit audio and video data separately. In the gateway, the multipoint control unit and the gatekeeper, the dynamic transmission layer service access point identification can be used instead of the fixed transmission layer service access point identification. A reliable transmission address is used for call establishment between the terminal and the terminal, and it can also be used between gatekeepers. A reliable call signal connection must be made according to the rules in the following example. In terminal-to-terminal call signal transmission, each terminal can open or close a reliable call signal channel. For the transmission of call signals by the gatekeeper, the terminal must ensure that reliable ports are opened throughout the process. Although the gatekeeper can choose whether to close the signal channel, for the call channel that the gateway is using, the gatekeeper must ensure that it is turned on. Q.931 information such as display information can be transmitted from end to end. If the reliable connection is disconnected due to some reason at the transport layer, the connection must be re-established. This call is not considered a failure. Unless the H.245 channel is closed. Call status and call reference values ​​are not affected by closing reliable connections. Multiple H.245 channels can be opened at the same time, so the same terminal can participate in multiple conferences at the same time. In a conference, a terminal can even open multiple types of channels simultaneously, for example, open two audio channels at the same time to obtain a stereo effect. However, only one H.245 control channel can be opened in a point-to-point call. The H.245 protocol defines the master-slave discrimination function. When two terminals in a call initiate a same event at the same time, a conflict occurs. For example, resources can only be used by one event. In order to solve this problem, the terminal must determine who is the master terminal and who is the slave terminal. The master-slave betrayal process is used to determine which terminal is the master terminal and which is the slave terminal. Once the state of the terminal is determined, it will not change during the entire call process. The performance exchange process is used to ensure that the transmitted media signal can be received by the receiving end, that is, the receiving end must be able to decode the received data. This requires that the receiving and decoding capabilities of each terminal must be known by the other terminal. The terminal does not need to have all the capabilities, and can ignore the requirements that are not understood. The terminal makes the other party know its receiving and decoding capabilities by sending its performance set. Receiving performance describes the terminal's ability to receive and process information streams. The sending must ensure that the content of the sent performance set can be done by itself. The sending capability provides the receiver with a selection of operating modes, from which the receiver can choose a certain mode. If the sending performance set is defaulted, it means that the sender has not selected it for the receiver, but it does not mean that the sender will not send data to the receiver. These performance sets enable the terminal to provide processing of multiple media streams simultaneously. For example, a terminal can simultaneously receive two different H.262 video signals and two different H.722 audio signals. The performance message describes not only the inherent capabilities of the terminal, but also which models it can have at the same time. It may also represent a compromise between sending performance and receiving performance. The terminal may use non-standard parameter structures to send non-standard performance and control messages. Non-standard messages are defined by manufacturers or other organizations to indicate the special capabilities of their terminals. The logical channel signal process ensures that when the logical channel is opened, the terminal has the ability to receive and decode data. The Open Logical Channel message contains a description about transferring data. The logical channel must only be opened when the terminal has the ability to receive data from all open channels simultaneously. A logical channel is opened by the transmitter. The receiver can request the transmitter to close the logical channel, and the transmitter can accept the request or reject the request. When the performance exchange ends, both terminals know the performance of the other side through the exchanged performance descriptors. The terminal does not need to know all the properties in the descriptor, as long as it knows the performance it uses. It is useful for the terminal to know the ring delay of itself and the other terminal. The ring delay discrimination is used to test the ring delay, and it can also be used to test whether the remote terminal exists. Commands and instructions can be used to transfer some special data. Commands and instructions will not get a response message from the remote terminal. The command is used to force the remote terminal to perform an action, and the description is used to provide information. The H.323 protocol stipulates that audio and video packets must be encapsulated in the real-time protocol RTP and carried by a UDP socket pair at the sending end and the receiving end. The real-time control protocol RTCP is used to evaluate the quality of the session and connection, and provide feedback information between the communicating parties. The corresponding data and its supporting packets can be operated via TCP or UDP. The H.323 protocol also stipulates that all H.323 terminals must carry a voice encoder. The minimum requirement is that they must support the G.711 recommendation. Easy Electronic Technology Co.,Ltd , https://www.nbpcelectronicgroup.com