S-102 Data Volume

During the project we have been considering how different datasets, based on resolution, affect data volume and consequently how much data must be distributed.

High-resolution data sets were expected to create an inappropriate volume for distribution. As such, we aimed to look at the possibility of producing smaller datasets, according to the specific needs of the end users, while still maintaining sufficient information, but without compromising intended use and quality.

For example, the end user may want specific depth curves or elevation tiles to meet the user requirements, whilst other, less relevant data is removed from the dataset to be downloaded.

Unfortunately, we were not able to adequately test this – the reference group did not require this in the operational testing and the Demonstrator did not include the functionality required to visualise this type of data extraction.  It was, therefore, not deemed a priority at this stage.

What we did experience through the upload testing, however, was the various countries based their S-102 test datasets on different criteria with regard to size and distribution of data, this in turn meant data volume varied greatly – single datasets varied in size from a few MB to over 1 GB.

Some producers decided to include larger geographical coverage in one data set (with greater data output as a result), whilst others chose to divide data into different degrees of resolution in combination with less area coverage per data set.

Lower resolution = Smaller data set.

Smaller geographical areas also provided smaller datasets. Overall, a high-resolution data set covering the same geographical area will be the same data size regardless of how the data cells are divided. The topography to be rendered is, of course, also crucial to the size of a data set.

A specific Norwegian dataset was used to see the variation in volume, models were generated with 1m, 2m, 5m and 10m resolution. The size of the 4 *.bag files were based on the same selected area as detailed below :

Table 1: Resolution and output file size on S-102 models.

The difference between a 1m resolution model and a 10m resolution model shows that the 1m model is nearly 100 times larger.  In the 10m model there will be a hundred 1m cells / pixels – the general assumption being that there would be smaller “no-data-areas” in the 10m model (no compression was done during the file export).

If the topography is “advanced” then a higher resolution data set would be required, as opposed to when the topography is “easier”. The data size factors are thus compounded, meaning we will probably see different solutions for the division of data sets from the different data producers (Hydrographic Offices).

When producing the Norwegian test datasets, the focus was to select the areas of interest based on the needs requested, which in this case was the planned operational tests.

During the operational tests, only data that met the required resolution and quality were used to produce the S-102 data sets. The data provided was of the highest-resolution and sufficient quality of data available. There has been no deliberate plan for adapting the data from different areas according to one another, such as using national tiling schemes.

Based on our experience throughout the project, we see that other producers have a more proactive approach and plan to produce data according to predefined grid (extent) and resolutions.

Lessons Learned:

Resolution and geographical coverage are the most important factors that determine the size of the dataset.

It may seem that the producers will produce large-span data in size and extent, the draft version of S-102 2.0.0 is quite cautious in detailing how this should be done.