Latest AI Research from China Presents “OMMO”: A Large-Scale Multimedia External Data Set and Benchmark for Narrative Presentation Synthesis and Implicit Scene Reconstruction

Neorealistic image-display superposition and high-resolution surface reconstruction are made possible by recent advances in implicit brain representations. Unfortunately, most of the techniques used now center around a single element or indoor scene, and when used in outdoor situations, their composition can perform better. Existing outdoor landscape datasets are generated on a modest geographic scale by rendering default scenes or by aggregating basic scenes with few items. The lack of standard benchmarks and large-scale outdoor scene datasets makes it impossible to evaluate the performance of some fairly recent methods, even though they are well designed for large scenes and attempt to address this problem.

Scene images from reconstructed scenes or virtual scenes, which differ from the original scene in elements of texture and appearance, are included in the BlendedMVS and UrbanScene3D collections. Collecting images from the Internet can create incredibly powerful datasets such as ImageNet and COCO. However, these techniques are not suitable for NeRF-based functional assessment due to constantly changing scene objects and lighting conditions. The standard for realistic outdoor scenes captured by a high-resolution industrial laser scanner is provided by, for example, Tanks and Temples. However, the size of its scene is still very small (463 m2 on average) and focus only on one body or exoskeleton.

Source: https://arxiv.org/pdf/2301.06782.pdf

Illustration of a city scene from our dataset, taken with a camera trail in the shape of a circle at low light. We show the camera path, written explanations of the scene, and multi-calibrated photos. Our dataset can render realistic, high-resolution texture detail; Some features are enlarged in the colored squares to show this.

Their approach to data collection is comparable to NeRFs with massive use of drones to record vast scenes of the real world. However, Mega-NeRF presents only two recurring scenarios, which prevents it from serving as a generally accepted baseline. Therefore, large-scale NeRF research of outdoor environments needs to catch up with individual elements or indoor scenes, since, to their knowledge, the standard and well-recognised large-scale scene dataset has not yet been developed for NeRF performance measurement. They offer a carefully selected multi-modal fly-view dataset to address the paucity of large-scale real-world outdoor scene datasets. As shown in the figure above, the dataset consists of 33 scenes with real-time annotations, tags, and 14k calibrated images. Unlike the existing methods mentioned above, their sightings come from various sources, including those we got from the Internet and ourselves.

In addition to being comprehensive and representative, set indicators include a range of scene types, scene sizes, camera paths, lighting conditions, and multimedia data that should be included in previous datasets. They also provide comprehensive dataset-based criteria for innovative display synthesis, scene representation, and multimodal synthesis to evaluate the suitability and performance of the dataset generated for evaluating standard NeRF approaches. More importantly, it provides a generic process for producing real NeRF-based data from online drone videos, making it easy for the community to expand their data set. To provide an accurate assessment of each approach, they also include several specific sub-criteria for each of the above tasks according to different scene types, scene sizes, camera paths, and lighting conditions.

To sum up, their major contributions are as follows:

• To enhance NeRF’s large-scale research, they are introducing an outdoor landscape dataset with multimodal data that is more abundant and diverse than any comparable outdoor dataset currently available.

• It provides many of the standardizing functions of common external NRF methods to create a unified standard measurement standard. Several tests show that their dataset can support typical NeRF-based tasks and provide quick annotations for the next search.

• To make their dataset easily scalable, they provide a low-cost pipeline to convert movies that can be freely downloaded from the Internet into NeRF intent training data.


scan the paper And Project page. All credit for this research goes to the researchers on this project. Also, don’t forget to join Our Reddit pageAnd discord channelAnd And Email newsletterwhere we share the latest AI research news, cool AI projects, and more.


Anish Teeku is a Consultant Trainee at MarktechPost. He is currently pursuing his undergraduate studies in Data Science and Artificial Intelligence from Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is in image processing and he is passionate about building solutions around it. Likes to communicate with people and collaborate on interesting projects.


Leave a Reply

Your email address will not be published. Required fields are marked *