Basics: loading CDMs¶

import kessler

Load CDMs from .kvn¶

In this tutorial, we show how to load CDMs from .kvn format.

First, the CDMs in .kvn format need to be placed inside the path_to_cdms_folder, for correctly loading the data. Furthermore, the code expects the CDMs in the folder to have file names grouped by: individual event and the CDM sequence in each event.

For instance, if we have to load two events with 3 and 2 CDMs each, we might then have file names in the following format:

event_1_01.cdm.kvn.txt
event_1_02.cdm.kvn.txt
event_1_03.cdm.kvn.txt
event_2_01.cdm.kvn.txt
event_2_02.cdm.kvn.txt

from kessler import EventDataset

We can then proceed in creating the EventDataset object:

path_to_cdms_folder='/Users/giacomoacciarini/cdm_data/cdms_kvn/'

events=EventDataset(path_to_cdms_folder)
#A message appears confirming that the loading has happened, with the number of CDMs and events.

Loading CDMS (with extension .cdm.kvn.txt) from directory: /Users/giacomoacciarini/cdm_data/cdms_kvn/
Loaded 0 CDMs grouped into 0 events

Loading CDMs from pandas `DataFrame` object¶

In this tutorial, we show how to load CDMs from pandas DataFrame object.

First we perform the relevant imports:

import kessler
import pandas as pd
from kessler import EventDataset

Then, we create the EventDataset object, after having uploaded the pandas dataframe and created the DataFrame object:

file_name='/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
df=pd.read_csv(file_name)
events=EventDataset.from_pandas(df)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[5], line 2
file_name='/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
----> 2 df=pd.read_csv(file_name)
events=EventDataset.from_pandas(df)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
   else:
       kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
if len(args) > num_allow_args:
   warnings.warn(
       msg.format(arguments=_format_argument_list(allow_args)),
       FutureWarning,
       stacklevel=find_stack_level(),
   )
--> 331 return func(*args, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
kwds_defaults = _refine_defaults_read(
   dialect,
   delimiter,
   (...)
   defaults={"delimiter": ","},
)
kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds)
_validate_names(kwds.get("names", None))
# Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
if chunksize or iterator:
   return parser

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
   self.options["has_index_names"] = kwds["has_index_names"]
self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine)
   if "b" not in mode:
       mode += "b"
-> 1735 self.handles = get_handle(
   f,
   mode,
   encoding=self.options.get("encoding", None),
   compression=self.options.get("compression", None),
   memory_map=self.options.get("memory_map", False),
   is_text=is_text,
   errors=self.options.get("encoding_errors", "strict"),
   storage_options=self.options.get("storage_options", None),
)
assert self.handles is not None
f = self.handles.handle

File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
elif isinstance(handle, str):
   # Check whether the filename is to be opened in binary mode.
   # Binary mode does not support 'encoding' and 'newline'.
   if ioargs.encoding and "b" not in ioargs.mode:
       # Encoding
--> 856         handle = open(
           handle,
           ioargs.mode,
           encoding=ioargs.encoding,
           errors=errors,
           newline="",
       )
   else:
       # Binary mode
       handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'

Loading CDMs from Kelvins Challenge dataset¶

In this tutorial, we show the case in which the data to be loaded comes from the Kelvins competition: a collision avoidance challenge organized by ESA in 2019.

For this purpose, we built a specific converter that takes care of the conversion from the Kelvins format to standard CDM format. First, we perform the relevant imports:

import kessler
from kessler.data import kelvins_to_event_dataset

Cannot import dbm.gnu: No module named '_gdbm'

/Users/giacomoacciarini/miniconda3/envs/fdl/lib/python3.7/site-packages/pyprob/util.py:327: UserWarning: Empirical distributions on disk may perform slow because GNU DBM is not available. Please install and configure gdbm library for Python for better speed.
  warnings.warn('Empirical distributions on disk may perform slow because GNU DBM is not available. Please install and configure gdbm library for Python for better speed.')

Then, we proceed in converting the Kelvins dataset as an EventDataset objetc. In the following example, we leverage two extra entries (i.e., drop_features and num_events) to exclude certain features when importing, and to only import a limited number of events (in this case 1000).

file_name='/Users/giacomoacciarini/cdm_data/kelvins_data/test_data.csv'
events=kelvins_to_event_dataset(file_name, drop_features=['c_rcs_estimate', 't_rcs_estimate'], num_events=1000)
#The output will show the number of CDMs and events loaded, as they progress.

Loading Kelvins dataset from file name: /Users/giacomoacciarini/cdm_data/kelvins_data/test_data.csv
24484 entries
Dropping features: ['c_rcs_estimate', 't_rcs_estimate']
Dropping rows with NaNs
21932 entries
Removing outliers
19531 entries
Shuffling
Grouped rows into 1726 events
Taking TCA as current time: 2022-02-17 23:50:10.189235
Converting Kelvins challenge data to EventDataset
Time spent  | Time remain.| Progress             | Events    | Events/sec
0d:00:00:07 | 0d:00:00:00 | #################### | 1000/1000 | 128.69       

Basics: loading CDMs¶

Load CDMs from .kvn¶

Loading CDMs from pandas DataFrame object¶

Loading CDMs from Kelvins Challenge dataset¶

Loading CDMs from pandas `DataFrame` object¶