These are all the research projects selected by I+D+BIT Audiovisual
The Audiovisual Technology Professional Show presents research and development projects, both from the university and business fields, selected by the R+D+BIT initiative that develop technologies, with high added value and growth potential applied to the sector.
With the aim of publicizing research and development projects, both in the university and business fields, that develop technologies, with high added value and growth potential applied to the audiovisual industry, the next edition of BIT Audiovisual, Professional Audiovisual Technology Exhibition, organized by IFEMA between May 8 and 10, presents the second edition of R+D+BIT Audiovisual.
A Selection Committee, chaired by Pere Vila, director of Technology, Innovation and Systems of the Corporation RTVE, has valued these research projects presented by universities, entities and companies by virtue of their contribution in terms of the opportunity of the research carried out; its ability to influence the future development of the audiovisual industry; its application potential; the originality of your approach, method or object; and the ability to bring together and generate collaboration between different interested actors.
The Miguel de Cervantes European University has presented the Voxel3d project aimed at laying the foundations for an interactive narrative based on the generation of three-dimensional images with movement. The purpose of this project is, based on the experience of giants in the sector such as Intel and Microsoft, to develop systems to recompose any type of situation with recordings and elements, independently filming the production ingredients to later integrate the scene into a virtual environment. .
The project would have its application in audiovisual productions and digital content, especially for cinema, television, Internet and video games.
On the other hand, three Spanish companies (Brainstorm, MR Factory y SDI) have come together to investigate hyperrealistic real-time previsualization in broadcast and cinema environments.
We were able to see the first applications of this project for the first time at the Brainstorm stand at the NAB 2018 fair (Las Vegas) and now, on the occasion of BIT Audiovisual, SDI will be demonstrating at its stand.
The project is based on the ability to generate hyperrealistic scenes in real time as a preview in 4K and 8K cinema and as a final result for HDTV and 4K in broadcast. Although chroma key technology and virtual studios have been around for a long time, the latter have sometimes been criticized for the relative lack of realism compared to other non-real-time applications, such as compositing and VFX technologies. However, virtual studios require real-time technology, so the demands on computing power are important if we want to increase the realism of the scenes and the complexity of their rendering.
At NAB 2017 Brainstorm presented its proposal for Combined Render Engine, a new technology that allows combining the eStudio rendering engine, the benchmark for 3D broadcast graphics and virtual studios in real time, and Epic Games' Unreal Engine, an advanced rendering engine for games that allows excellent hyper-realistic image quality.
MR Factory has developed a workflow for previewing VFX shots for cinema, which uses Brainstorm's Combined Render Engine to guarantee the quality of the shots before entering post-production. The possibility of using InfinitySet as a preview hub in HDTV or even in 4K, thanks to the Combined Render Engine, allows this technology to be used to obtain substantial savings in filming and post-production costs, by guaranteeing the adjustment of the different shots (chroma, movements camera, tracking adjustment on the background...). Once everything is adjusted in preview, InfinitySet is capable of exporting camera data, tracking and set movements to enter post-production in 8K resolution with all quality guarantees.
Nokia Spain attends this R&D+BIT with a project that aims to optimize video coding for distribution over the Internet, especially for mobile devices. His research seeks to reduce the bandwidth used in video distribution, helping the profitability of this business. The studies are supported by an analysis of the impact of resolution and video quality on mobile devices that has been carried out in collaboration with the Polytechnic University of Madrid.
The Polytechnic University of Catalonia (UPC) and the School of Telecommunications and Aerospace Engineering of Castelldefels (EETAC) are working on an interesting project to develop a prototype of an SDN (Software Defined Networking) controller for TSN (Time Sensitive Networking) networks in open software regime and give it to the community of developers to improve and expand it in the future. This will allow TV operators (and other industries) to take advantage of the code, and will encourage innovation and development of free software related to the audiovisual industry, as encouraged by EBU in its Open Source Community initiative.
La tecnología SDI usada en producción de televisión (señal de video sin comprimir transportada sobre un circuito digital -cable coaxial o fibra-) será próximamente reemplazada por tecnología Ethernet/IP (conmutación de paquetes), y específicamente por los protocolos AVB/TSN (Audio/Video Bridging, Time Sensitive Networking) que el IEEE está estandarizando en el Working Group 802.1. Básicamente es una Ethernet síncrona basada en ranuras temporales y reloj PTP/IEEE 1588. Esto permitirá ahorros notables en el coste de instalación y operación de los equipos de producción de TV, así como un incremento en la flexibilidad de las operaciones. Otros grupos como el Joint Team on Networked Media (JT-NM) impulsado por EBU, SMPTE y VSF están desarrollando arquitecturas All-Ethernet&IP en la misma línea, e incluso prevén la virtualización completa de la producción de TV y la introducción de cloud computing . Aparte del caso de uso de producción de TV, TSN tiene también mucho futuro en áreas como industria 4.0, o buses de comunicación en vehículos/trenes/aviones.
On the other hand, in the world of IP networks, an important technological change is being experienced in the area of device management and control, moving away from the traditional distributed model (in which routers and switches had a certain level of intelligence). and cooperate with each other) to the Software-Defined Networking (SDN) model, in which the intelligence of the devices is concentrated in a controller, and the computers become “dumb slaves” that execute what the controller asks of them. This allows controllers to have a global view of the network status (devices, distribution of transported traffic, problems, etc.) and can easily perform optimizations (load balancing, route optimization, mirroring and flow protection) that are difficult to do. perform in distributed architecture.
The introduction of SDN for the management of AVB/TSN networks will represent a notable improvement, which is why the IEEE is already defining it in the 802.1Qcc standard, under development. 802.1Qcc defines two levels of controller (CUC, Centralized User Configuration) and CNC (Centralized Network Configuration) and the interfaces between them and network equipment, based on RESTconf messages over HTTP and YANG data models.
Hipermedia Laboratory and the Carlos III University from Madrid have presented to this edition of R+D+BIT the Azor project that aims to implement a comprehensive solution for capturing and editing multi-camera video through wireless technology. Although there were technological advances to transmit the video signal wirelessly, there was no solution that allowed the capture of several video signals at the same time through a wireless connection and that allowed the simultaneous editing of all the signals.
Azor is conceived as a novel portable television studio with advanced functionalities based on artificial vision and machine learning technologies. It consists of a synchronized and multi-capture video capture program (up to 8-10 cameras connected at the same time) based on a computer program connected to several cameras by remote control
One of the main research opportunities is the implementation of the VAR (video assistant referee) video refereeing system in different leagues (Italy, Germany...) and in global tournaments (2016 and 2017 Club World Cups, U-20 World Cup). . The use of this system in the next World Cup in Russia is a sign that the system has been solidly implemented in soccer refereeing.
Furthermore, the arrival of this system implies the emergence of new training needs in refereeing, including familiarization with the analysis of match situations from different points of view by different levels and generations of referees.
The Azor system constitutes a tool to analyze game situations in training using four cameras (although the system can be scalable and adapted to the use of more cameras distributed at different points of the training field). Likewise, the system has a series of functions that facilitate the analysis of plays, such as the tagging system, which allows delayed analysis of a specific fragment of the game situation recorded on video. Labeling by situations allows speeding up the analysis and provides information for statistics.
Finally, the system allows video capture from wireless devices, which in turn allows the system to be simple and quick to install, without wiring that could interfere with the referees' movements.
Additionally, the system can be used to record events that require multi-camera production (conferences, seminars...) but without the need to allocate large technical and personal resources to carry it out.
The Carlos III University has also presented, in this case alone, the GoAll-PervasiveSUB project conceived to provide accessibility to deafblind people. Deaf-blindness is considered one of the most severe disabilities in the world, because when it comes to communicating it is much more complicated and results in isolation, which is why the use of this software is considered appropriate. The majority of these people do not go out alone due to the high risk they run, which is why most of them require an interpreter to be able to function in their daily lives. It must also be taken into account that since they cannot see or hear, their way of communicating is through touch, in which society is not prepared to interact in an effective and affective way with them, which is why it is considered a relevant method.
It is a pioneering project worldwide with a methodology focused on technological research and development. The software is responsible for extracting subtitles from television channels, sending them to a central server, from where they are forwarded to smartphones or tablets. The only thing the deaf-blind person has to do is connect with the application called GoAll to the central server and choose the chain they want to access; In this way, the application will be responsible for sending the subtitles to the braille line.
Considering that this population does not have the same reading ability as a person without disabilities, the system has been configured so that it can pass the subtitles more slowly. It must also be taken into account that braille lines are not all the same, so the solution was to cut up the subtitles so that they are sent according to the characters available on the line. This technology has allowed people with this disability to be able to carry out activities that they could not before or needed help to carry out, such as knowing the daily news.
It is estimated that the number of deafblind people in Spain ranges between 7,000 and 100,000, being a very heterogeneous group with different types and degrees of visual and hearing loss who, without a doubt, will benefit from initiatives like this.
Ugiat Technologies participates in R+D+BIT with the AutoFace (Politics) project that monitors various television channels and discovers, completely automatically, the names of the different characters that appear, creating statistics on their appearance time and the emotions they express.
The system is implemented by combining multimodal analysis that integrates audio, video and images. The video is used to extract key images that are subsequently analyzed to detect character faces and possible graphic texts that appear in the images. In turn, a voice-to-text conversion system is applied with the audio.
Subsequently, facial recognition algorithms based on Deep Learning are applied to the detected faces, which group the different individuals into 'clusters'. Another Convolutional Neural Network is also applied to recognize the character's mood, classifying between 7 different emotions (neutral, happiness, anger, sadness, surprise, fear, disgust). Finally, applying artificial intelligence techniques, the names of the characters that will appear in the graphics or in the dialogues must be associated with the clusters generated by the facial recognition system.
To do this, Ugiat has designed a preliminary system that captures the different news programs of the main television networks in Spain: RTVE, Antena 3, Telecinco, Cuatro, La Sexta, TV3; analyzing an image every second and recognizing the main politicians, obtaining statistics of their appearance times. The system analyzes about 2 million images each month.
All deep learning algorithms used for facial detection and recognition, emotion detection, graph detection and artificial intelligence to link characters with names have been developed at Ugiat Technologies, a spin-off of the Polytechnic University of Catalonia. Speech to text conversion is done using third party software.
The self-catalogation of content and the automatic extraction of metadata is becoming increasingly important in different areas of the audiovisual industry. In content production, tagging raw material provides a great help for more efficient editing. In recommendation systems, the extracted audiovisual metadata can be used to improve the system's predictions, which are currently based exclusively on textual information about the characteristics of the content. With automatic metadata analysis, not only actors but also products, trademarks, colors, movements, types of scenes can be detected, which provide very important big data to obtain user profiles. Furthermore, the metadata obtained can be used to enrich the user experience, interacting with the customer for more efficient navigation (repeat phrases, scenes) or with greater descriptive content (actors, songs, products...).
The i2CAT Foundation participates in this R&D+BIT with three projects. On the one hand, it is immersed in the development of ImmersiaTV, which pursues the creation of new formats for the production, distribution and consumption of TV content to enable immersive and customizable experiences. The objective is not only to offer efficient support for multi-screen scenarios, but to achieve a transparent integration between traditional content and omnidirectional content (such as 360º video and spatial audio), thus opening the door to new fascinating scenarios. The project encompasses research in technological, creative and user experience aspects.
The project provides benefits throughout the entire end-to-end audiovisual chain as it offers new solutions and tools for the capture, production and editing of omnidirectional content (for example as an Adobe Premiere Pro plugin) or scenes combining 360º videos with conventional graphic content and videos (through portals or “overlays”), with appropriate transitions and effects; as well as new solutions for the signaling of immersive services, their linkage with traditional broadcast content (such as HbbTV) and for their adaptive distribution via broadband (taking into account the heterogeneity of consumer devices and regions of interest for 360º videos) and their synchronized playback. . It also opens the door to new consumption platforms for omnidirectional content, both based on Unity3D and web components, with support for multi-screen scenarios.
Otro de los proyectos en los que está inmersa la Fundación i2CAT es ImAc (Immersive Accessibility) que busca garantizar la accesibilidad en servicios multimedia immersivos, incluyendo video 360º, audio espacial y contenidos de Realidad Virtual (RV). Esto permitirá proporcionar una narrativa adecuada, un mayor y mejor acceso a la información y usabilidad, independientemente de las capacidades sensoriales y cognitivas de los usuarios, su edad, idioma, así como otras dificultades o deficiencias. Entre los servicios de accesibilidad a proporcionar, se incluyen: subtítulos, audio subtítulos, audio descripción y vídeos con intérpretes de lengua de signos, además de interfaces de usuario y tecnologías de apoyo apropiadas. La idea es que las funcionalidades de inmersividad y accesibilidad sea adaptativas en función de las necesidades y/o preferencia de los usuarios, así que como sean compatibles con las tecnologías y formatos utilizados comúnmente en el sector audiovisual.
The i2CAT Foundation has also presented to this R+D+BIT the VR-Together project, whose main objective is to enable Virtual Reality (VR) experiences that allow natural social interaction between remote users immersed in common virtual environments, from domestic environments, affordably and with photo-realistic quality.
This pioneering project encompasses the assembly of an end-to-end platform using state-of-the-art software technologies and commercial (low-cost) hardware components. On the other hand, the project pursues the design of innovative solutions and optimizations for various essential technological and creative aspects in different stages and processes of the end-to-end platform, including: capture, encoding, processing, distribution and consumption. At all times it considers existing technologies and infrastructures, proposing better ones and/or extensions that are both backward- and standard-compliant. Furthermore, the project follows a methodology in which users are protagonists in each process (user-centric methodology), whether they are end users, professionals or interested agents, in order to accurately obtain the necessary requirements and validate the results. obtained.
In several countries, the switching off of analogue television or the transition from first generation digital television (DVB-T) to second generation (DVB-T2) is being carried out or is going to be carried out, so up to three types of television signals in the same bands of the radio spectrum. Sometimes it is convenient to be able to distinguish the type of signal or some of its basic parameters when adapting television reception facilities or to monitor the deployment of broadcast networks. Thus, Gradient (Galician Telecommunications Technology Center) has developed a project that takes into account that many measurement equipment incorporates complete receiver chips simply to obtain the information they need regarding the type of television signal and its basic parameters, which makes said equipment more expensive and does not allow an update against possible variants. future or new television broadcast standards.
The proposed solution based on computationally simple algorithms, carried out in software or reconfigurable logic, uses logical resources already available in the measurement equipment with a very low incremental cost and can be adapted to future variations in television broadcasting standards.
One of the projects presented to R+D+BIT in which more companies participate is EasyTV (Easing the access of Europeans with disabilities to converging media and content). The Polytechnic University of Madrid (UPM, Spain), Engineering Ingegneria Informatica SPA (ENG, Italy), Center for Research and Technology Hellas (CERTH, Greece), Mediavoice SRL (MV, Italy), Universitat Autònoma Barcelona (UAB, Spain), Corporació Catalana de Mitjans Audiovisuals SA (CCMA, Spain), ARX.NET (ARX, Greece), Sezione Provinciale di Roma dell'Unione Italiana dei ciechi e degli ipovedenti (UICI, Italy), and the National Deaf Confederation Foundation Spain for the removal of communication barriers (FCNSE, Spain).
The objective of the EasyTV project is to facilitate the access of people with sensory disabilities (hearing/visual) to the first-line products and services of the information society and telecommunications so that they can enjoy audiovisual content at the same level as the rest of the population, avoiding marginalization and the problems created by the existing inequality in access to information. Therefore, the project proposes the design and implementation of a platform that presents a collection of technological solutions that allow the improvement of accessibility in terms of advanced subtitling, creation of automatic audio narratives, clean-audio, configurable sign language videos, audio subtitles or image magnification, among others, that allow breaking the existing language barrier. All this, in a customization environment, where the system adapts to the specific needs of each user and their preferences.
From a technological point of view, the project is based on the analysis of the state of the art of accessible technologies and on the prior knowledge acquired in other European projects such as HBB4ALL, DTV4ALL, Cloud4ALL or Prosperity4All. The project will offer innovative services for improved access to audiovisual content to increase the user experience in two main lines: image adaptation offering content-based magnification and intensification using various innovative algorithms; Enhanced content description with automated narrations and improved audio intelligibility; and innovative technologies to break down sign language barriers with a crowdsourcing platform and an interlingual translator.
All of this in an environment of hyper-personalization of content where the user receives recommendations about the new services available and that allows them to adapt the different interfaces to their needs. The use of the new HbbTV standard within connected television for the dissemination of applications related to television channels allows interconnection between devices to offer improved access to information.
Lastly, the company As a vision In collaboration with the rest of the partners of the 5G-CROSSHAUL project, they have presented an initiative that seeks to improve the evaluation of technologies for video transmission over IP. The demand for new and more efficient algorithms for checking the network status by the user and the need to ensure the basic quality requirements in the transmission of high-impact content becomes of vital importance, especially taking into account that the content It presents increasingly demanding features, including higher resolution (4K, 8K) or rich formats such as HDR (High Dynamic Range) and HFR (High Frame Rate).
For this reason, one of the developments carried out within the 5G-Crosshaul project consists of a probe for video quality analysis, which allows in a virtualized way to connect to any node of the transmission network to check its status. and prevent the user from receiving artifacts and unwanted effects in the multimedia content, since the transmission of an alert would allow the immediate reconfiguration of the network nodes so that the user receives the content in optimal conditions, thus increasing the quality of the experience. .
Among all these projects, five have been selected for presentation within the framework of the BIT Audiovisual Forum, (Pavilion 7 Auditorium) throughout the three days of the fair. Specifically, it is the SDN Controller for TSN networks in TV production; Goshawk; GoAll-PervasiveSUB (television for the deafblind); EasyTV, and Visiona's initiative with the partners of the 5G-CROSSHAUL project.
Did you like this article?
Subscribe to our RSS feed and you won't miss anything.