Basics: loading CDMs¶
import kessler
Load CDMs from .kvn¶
In this tutorial, we show how to load CDMs from .kvn format.
First, the CDMs in .kvn format need to be placed inside the path_to_cdms_folder, for correctly loading the data. Furthermore, the code expects the CDMs in the folder to have file names grouped by: individual event and the CDM sequence in each event.
For instance, if we have to load two events with 3 and 2 CDMs each, we might then have file names in the following format:
event_1_01.cdm.kvn.txtevent_1_02.cdm.kvn.txtevent_1_03.cdm.kvn.txtevent_2_01.cdm.kvn.txtevent_2_02.cdm.kvn.txt
from kessler import EventDataset
We can then proceed in creating the EventDataset object:
path_to_cdms_folder='/Users/giacomoacciarini/cdm_data/cdms_kvn/'
events=EventDataset(path_to_cdms_folder)
#A message appears confirming that the loading has happened, with the number of CDMs and events.
Loading CDMS (with extension .cdm.kvn.txt) from directory: /Users/giacomoacciarini/cdm_data/cdms_kvn/
Loaded 0 CDMs grouped into 0 events
Loading CDMs from pandas DataFrame object¶
In this tutorial, we show how to load CDMs from pandas DataFrame object.
First we perform the relevant imports:
import kessler
import pandas as pd
from kessler import EventDataset
Then, we create the EventDataset object, after having uploaded the pandas dataframe and created the DataFrame object:
file_name='/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
df=pd.read_csv(file_name)
events=EventDataset.from_pandas(df)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[5], line 2
1 file_name='/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
----> 2 df=pd.read_csv(file_name)
3 events=EventDataset.from_pandas(df)
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
209 else:
210 kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
325 if len(args) > num_allow_args:
326 warnings.warn(
327 msg.format(arguments=_format_argument_list(allow_args)),
328 FutureWarning,
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(*args, **kwargs)
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
935 kwds_defaults = _refine_defaults_read(
936 dialect,
937 delimiter,
(...)
946 defaults={"delimiter": ","},
947 )
948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds)
602 _validate_names(kwds.get("names", None))
604 # Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
607 if chunksize or iterator:
608 return parser
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
1439 self.options["has_index_names"] = kwds["has_index_names"]
1441 self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine)
1733 if "b" not in mode:
1734 mode += "b"
-> 1735 self.handles = get_handle(
1736 f,
1737 mode,
1738 encoding=self.options.get("encoding", None),
1739 compression=self.options.get("compression", None),
1740 memory_map=self.options.get("memory_map", False),
1741 is_text=is_text,
1742 errors=self.options.get("encoding_errors", "strict"),
1743 storage_options=self.options.get("storage_options", None),
1744 )
1745 assert self.handles is not None
1746 f = self.handles.handle
File ~/checkouts/readthedocs.org/user_builds/kessler/envs/latest/lib/python3.8/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
851 elif isinstance(handle, str):
852 # Check whether the filename is to be opened in binary mode.
853 # Binary mode does not support 'encoding' and 'newline'.
854 if ioargs.encoding and "b" not in ioargs.mode:
855 # Encoding
--> 856 handle = open(
857 handle,
858 ioargs.mode,
859 encoding=ioargs.encoding,
860 errors=errors,
861 newline="",
862 )
863 else:
864 # Binary mode
865 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/giacomoacciarini/cdm_data/cdms_csv/sample.csv'
Loading CDMs from Kelvins Challenge dataset¶
In this tutorial, we show the case in which the data to be loaded comes from the Kelvins competition: a collision avoidance challenge organized by ESA in 2019.
For this purpose, we built a specific converter that takes care of the conversion from the Kelvins format to standard CDM format. First, we perform the relevant imports:
import kessler
from kessler.data import kelvins_to_event_dataset
Cannot import dbm.gnu: No module named '_gdbm'
/Users/giacomoacciarini/miniconda3/envs/fdl/lib/python3.7/site-packages/pyprob/util.py:327: UserWarning: Empirical distributions on disk may perform slow because GNU DBM is not available. Please install and configure gdbm library for Python for better speed.
warnings.warn('Empirical distributions on disk may perform slow because GNU DBM is not available. Please install and configure gdbm library for Python for better speed.')
Then, we proceed in converting the Kelvins dataset as an EventDataset objetc. In the following example, we leverage two extra entries (i.e., drop_features and num_events) to exclude certain features when importing, and to only import a limited number of events (in this case 1000).
file_name='/Users/giacomoacciarini/cdm_data/kelvins_data/test_data.csv'
events=kelvins_to_event_dataset(file_name, drop_features=['c_rcs_estimate', 't_rcs_estimate'], num_events=1000)
#The output will show the number of CDMs and events loaded, as they progress.
Loading Kelvins dataset from file name: /Users/giacomoacciarini/cdm_data/kelvins_data/test_data.csv
24484 entries
Dropping features: ['c_rcs_estimate', 't_rcs_estimate']
Dropping rows with NaNs
21932 entries
Removing outliers
19531 entries
Shuffling
Grouped rows into 1726 events
Taking TCA as current time: 2022-02-17 23:50:10.189235
Converting Kelvins challenge data to EventDataset
Time spent | Time remain.| Progress | Events | Events/sec
0d:00:00:07 | 0d:00:00:00 | #################### | 1000/1000 | 128.69