Basics: loading CDMs

import kessler

Load CDMs from .kvn

In this tutorial, we show how to load CDMs from .kvn format.

First, the CDMs in .kvn format need to be placed inside the path_to_cdms_folder, for correctly loading the data. Furthermore, the code expects the CDMs in the folder to have file names grouped by: individual event and the CDM sequence in each event.

For instance, if we have to load two events with 3 and 2 CDMs each, we might then have file names in the following format:

  • event_1_01.cdm.kvn.txt

  • event_1_02.cdm.kvn.txt

  • event_1_03.cdm.kvn.txt

  • event_2_01.cdm.kvn.txt

  • event_2_02.cdm.kvn.txt

from kessler import EventDataset

We can then proceed in creating the EventDataset object:

path_to_cdms_folder='/Users/giacomoacciarini/cdm_data/cdms_kvn/'

events=EventDataset(path_to_cdms_folder)
#A message appears confirming that the loading has happened, with the number of CDMs and events.
Loading CDMS (with extension .cdm.kvn.txt) from directory: /Users/giacomoacciarini/cdm_data/cdms_kvn/
Loaded 0 CDMs grouped into 0 events

Loading CDMs from pandas DataFrame object

In this tutorial, we show how to load CDMs from pandas DataFrame object.

First we perform the relevant imports:

import kessler
import pandas as pd
from kessler import EventDataset

Then, we create the EventDataset object, after having uploaded the pandas dataframe and created the DataFrame object:

file_name='/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
df=pd.read_csv(file_name)
events=EventDataset.from_pandas(df)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 2
      1 file_name='/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
----> 2 df=pd.read_csv(file_name)
      3 events=EventDataset.from_pandas(df)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds)
    602 _validate_names(kwds.get("names", None))
    604 # Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
    607 if chunksize or iterator:
    608     return parser

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
   1439     self.options["has_index_names"] = kwds["has_index_names"]
   1441 self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine)
   1733     if "b" not in mode:
   1734         mode += "b"
-> 1735 self.handles = get_handle(
   1736     f,
   1737     mode,
   1738     encoding=self.options.get("encoding", None),
   1739     compression=self.options.get("compression", None),
   1740     memory_map=self.options.get("memory_map", False),
   1741     is_text=is_text,
   1742     errors=self.options.get("encoding_errors", "strict"),
   1743     storage_options=self.options.get("storage_options", None),
   1744 )
   1745 assert self.handles is not None
   1746 f = self.handles.handle

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    851 elif isinstance(handle, str):
    852     # Check whether the filename is to be opened in binary mode.
    853     # Binary mode does not support 'encoding' and 'newline'.
    854     if ioargs.encoding and "b" not in ioargs.mode:
    855         # Encoding
--> 856         handle = open(
    857             handle,
    858             ioargs.mode,
    859             encoding=ioargs.encoding,
    860             errors=errors,
    861             newline="",
    862         )
    863     else:
    864         # Binary mode
    865         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'

Loading CDMs from Kelvins Challenge dataset

In this tutorial, we show the case in which the data to be loaded comes from the Kelvins competition: a collision avoidance challenge organized by ESA in 2019.

For this purpose, we built a specific converter that takes care of the conversion from the Kelvins format to standard CDM format. First, we perform the relevant imports:

import kessler
from kessler.data import kelvins_to_event_dataset
Cannot import dbm.gnu: No module named '_gdbm'
/Users/giacomoacciarini/miniconda3/envs/fdl/lib/python3.7/site-packages/pyprob/util.py:327: UserWarning: Empirical distributions on disk may perform slow because GNU DBM is not available. Please install and configure gdbm library for Python for better speed.
  warnings.warn('Empirical distributions on disk may perform slow because GNU DBM is not available. Please install and configure gdbm library for Python for better speed.')

Then, we proceed in converting the Kelvins dataset as an EventDataset objetc. In the following example, we leverage two extra entries (i.e., drop_features and num_events) to exclude certain features when importing, and to only import a limited number of events (in this case 1000).

file_name='/Users/giacomoacciarini/cdm_data/kelvins_data/test_data.csv'
events=kelvins_to_event_dataset(file_name, drop_features=['c_rcs_estimate', 't_rcs_estimate'], num_events=1000)
#The output will show the number of CDMs and events loaded, as they progress.
Loading Kelvins dataset from file name: /Users/giacomoacciarini/cdm_data/kelvins_data/test_data.csv
24484 entries
Dropping features: ['c_rcs_estimate', 't_rcs_estimate']
Dropping rows with NaNs
21932 entries
Removing outliers
19531 entries
Shuffling
Grouped rows into 1726 events
Taking TCA as current time: 2022-02-17 23:50:10.189235
Converting Kelvins challenge data to EventDataset
Time spent  | Time remain.| Progress             | Events    | Events/sec
0d:00:00:07 | 0d:00:00:00 | #################### | 1000/1000 | 128.69