Quickstart

This is the quickstart guide for creating a simulation workflow package based on libpyvinyl. Please first install libpyvinyl following the instruction in the Installation section.

Introduction

This section is intended to help a developer understand how a new simulation package can use libpyvinyl as a foundation. It is important to understand that libpyvinyl provides base classes from which a developer inherits more specialised classes from, and the final class then both contains the new functionality and the basic capabilities. To make a new package, a developer would have to inherit from these baseclasses:

  • BaseCalculator

  • BaseData

  • BaseDataFormat

Calculator

The specialised calculator that inherits from BaseCalculator is capable of performing a calculation of some sort. The calculation can depend on some data and input values for parameters specified when the calculator is built. The calculator can also return output data. The scope of a calculator is somewhat arbitrary, but the power of libpyvinyl comes from the ability to break a big calculation down into smaller parts of individual calculators. When using a calculator, it is easy for the user to understand a small number of parameters as there are less risks of ambiguity. A rich Parameter class is provided by libpyvinyl to create the necessary parameters in each calculator. When creating a parameter it is possible to set allowed intervals to avoid undefined behaviour.

Data

To create a description of some data that can be either given to or returned from a calculator one starts with the BaseData class. This data could be for example a number of particle states.

DataFormat

Each Data class will have a number of supported DataFormat which are necessary in order to save the data to disk. Our particle data from before could be saved as a json, yaml or some compressed format, and each would need a DataFormat class that contains methods to read and write such data, and make it available to a corresponding Data class.

First steps as a developer

To build a simulation package in this framework, think about what calculation need to be performed and what parameters are needed to describe it. Then divide this big calculation into calculators with a limited number of parameters and clear input and output data. For example a particle source, it would need parameters describing the properties of emitted particles and then return a Data object with a large number of particle states. Then a calculator describing a piece of optics might have parameters describing its geometry, and it could have particle states as both input and output. With these kinds of considerations it becomes clear what Calculators and Data classes should be written.

Benefit of libpyvinyl

When a package uses libpyvinyl as a foundation, libpyvinyl can be used to write a simulation from a series of these calculators using the Instrument class. Here is an example of a series of calculators that form a simple instrument.

Calculator

Description

Parameters

Input Data

Output Data

Source

Emits particles

size, divergence, energy

None

particle states

Monochromator

Crystal

position, d_spacing, mosaicity

particle states

particle states

Monochromator

Crystal

position, d_spacing, mosaicity

particle states

particle states

Sample

Crystal sample

position, d_spacing, mosaicity

particle states

particle states

Detector

Particle detector

position, size, sensitivity

particle states

counts in bins

This setup uses two monochromators, each with their own parameters. The user can set up a master parameter that control both, for example to ensure they have the same d_spacing. Running the instrument then corresponds to running each calculator in turn and providing the output of one to the next.

Design a minimal instrument

As a minimal start, we will create an instrument with a calculator that can get the sum of two numbers.

There are 3 specialized classes needed to be defined for the package:

  • CalculatorClass: a class based on BaseCalculator to perform the calculation.

  • DataClass: to represent the input and output data of the CalculatorClass.

  • FormatClass: the interface to exchange data between the memory and the file on the disk in a specific format.

Define a simple python object mapping DataClass

Let’s firstly define a NumberData class mapping the python objects in the memory. This is done by creating a mapping dictionary to connect the data (e.g. an array or a single value) in the python object to the reference variable.

[1]:
from libpyvinyl.BaseData import BaseData

class NumberData(BaseData):
    def __init__(self,key,data_dict=None,filename=None,
                 file_format_class=None,file_format_kwargs=None):

        expected_data = {}

        ### DataClass developer's job start
        expected_data["number"] = None
        ### DataClass developer's job end

        super().__init__(key,expected_data,data_dict,
                         filename,file_format_class,file_format_kwargs)

    @classmethod
    def supported_formats(self):
        ### DataClass developer's job start
        format_dict = {}
        ### DataClass developer's job end
        return format_dict

# Test if the definition works
data = NumberData(key="test")

The above example shows a minimal definition of a DataClass. There are only two sections need to consider by the simulation package developers:

  • expected_data: A dictionary whose keys are the expected keys of the dictionary returned by get_data(), we just simply would like to get a “number” from a NumberData.

  • format_dict: A dictionary of supported format for hard disk files. Now we only need a python object mapper, so we just assign an empty dict to it for the moment.

Define a DataClass also supporting file mapping

For a software writing the data to a file instead of a python object, it’s necessary to have a interface between the file and the DataClass. We create a FormatClass as the interface:

[2]:
import numpy as np
from libpyvinyl.BaseFormat import BaseFormat

class TXTFormat(BaseFormat):
    def __init__(self) -> None:
        super().__init__()

    @classmethod
    def format_register(self):
        key = "TXT"
        desciption = "TXT format for NumberData"
        file_extension = ".txt"
        read_kwargs = [""]
        write_kwargs = [""]
        return self._create_format_register(
            key, desciption, file_extension, read_kwargs, write_kwargs
        )

    @staticmethod
    def direct_convert_formats():
        return []

    @classmethod
    def convert(
        cls, obj: BaseData, output: str, output_format_class: str, key, **kwargs):
        raise NotImplementedError


    @classmethod
    def read(cls, filename: str) -> dict:
        """Read the data from the file with the `filename` to
        a dictionary. The dictionary will be used by its corresponding data class."""
        number = float(np.loadtxt(filename))
        data_dict = {"number": number}
        return data_dict

    @classmethod
    def write(cls, object, filename: str, key: str = None):
        """Save the data with the `filename`."""
        data_dict = object.get_data()
        arr = np.array([data_dict["number"]])
        np.savetxt(filename, arr, fmt="%.3f")
        if key is None:
            original_key = object.key
            key = original_key + "_to_TXTFormat"
        return object.from_file(filename, cls, key)

# Test if the definition works
data = TXTFormat()

In the above example, we create a TXTFormat class based on the BaseFormat abstract class. We need to provide:

  • The information of the format_register method to get registered in the NumberData.supported_formats() method. This will be explained later.

  • the read function to read the data from the file into the data_dict, which will be accessed by the NumberData class by NumberData.get_data(). The dictionary keys match those in the expected_data of NumberData.

  • The write function to write the NumberData object into a file in TXTFormat.

For the other methods above, we just need to copy but don’t have to touch them at this moment.

Then, we just need add the TXTFormat to the NumberData created in the last section.

[3]:
class NumberData(BaseData):
    def __init__(self,key,data_dict=None,filename=None,
                 file_format_class=None,file_format_kwargs=None):

        expected_data = {}

        ### DataClass developer's job start
        expected_data["number"] = None
        ### DataClass developer's job end

        super().__init__(key,expected_data,data_dict,
                         filename,file_format_class,file_format_kwargs)

    @classmethod
    def supported_formats(self):
        ### DataClass developer's job start
        format_dict = {}
        self._add_ioformat(format_dict, TXTFormat)
        ### DataClass developer's job end
        return format_dict

You can list the formats it supports with:

[4]:
NumberData.list_formats()
Format class: <class '__main__.TXTFormat'>
Key: TXT
Description: TXT format for NumberData
File extension: .txt


Define a Calculator with native python object output

Assuming we have a simulation code whose output is a native python object (e.g. a list or dict), we can create a CalculatorClass for the simulation code:

[5]:
from typing import Union
from pathlib import Path
from libpyvinyl.BaseData import DataCollection
from libpyvinyl.BaseCalculator import BaseCalculator, CalculatorParameters

class PlusCalculator(BaseCalculator):
    def __init__(self, name: str, input: Union[DataCollection, list, NumberData],
                 output_keys: Union[list, str] = ["plus_result"],
                 output_data_types=[NumberData], output_filenames: Union[list, str] = [],
                 instrument_base_dir="./", calculator_base_dir="PlusCalculator",
        parameters=None):
        """A python object calculator example"""
        super().__init__(name, input, output_keys, output_data_types=output_data_types,
            output_filenames=output_filenames, instrument_base_dir=instrument_base_dir,
            calculator_base_dir=calculator_base_dir, parameters=parameters)

    def init_parameters(self):
        parameters = CalculatorParameters()
        times = parameters.new_parameter(
            "plus_times", comment="How many times to do the plus"
        )
        times.value = 1
        self.parameters = parameters

    def backengine(self):
        Path(self.base_dir).mkdir(parents=True, exist_ok=True)
        input_num0 = self.input.to_list()[0].get_data()["number"]
        input_num1 = self.input.to_list()[1].get_data()["number"]
        output_num = float(input_num0) + float(input_num1)
        if self.parameters["plus_times"].value > 1:
            for i in range(self.parameters["plus_times"].value - 1):
                output_num += input_num1
        data_dict = {"number": output_num}
        key = self.output_keys[0]
        output_data = self.output[key]
        output_data.set_dict(data_dict)
        return self.output

In the above example, we define a PlusCalculator based on the BaseCalculator. The following needs to be provided:

  • Some default output-related values to initialize empty output Data containers (see here):

    • output_keys: the key of each Data object in the output DataCollection

    • output_data_types: the Data type of each Data object.

    • output_filenames: the filenames of the output files (if any)

  • init_parameters to define the default values of the parameters need by the calculator. Range restrictions and units of values can be also set here. Details can be found in the parameter use guide.

  • backengine to define how to conduct the calculation. It should return a reference of the output DataCollection.

The PlusCalculator.backengine adds two numbers enclosed in a input DataCollection for PlusCalculator.parameters["plus_times"].value times. The reference dictionary of python objects data_dict is passed to the corresponding NumberData in the auto-initialized self.output: DataCollection by

output_data.set_dict(data_dict)

Let’s create an instance from the class:

[6]:
input1 = NumberData.from_dict({"number": 1}, "input1")
input2 = NumberData.from_dict({"number": 1}, "input2")
calculator_plus = PlusCalculator(name="test",input=[input1,input2])

Check available parameters of it:

[7]:
print(calculator_plus.parameters)
 - Parameters object -
plus_times                          1                               How many times to do the plus

Run the calculator with default parameters

[8]:
result = calculator_plus.backengine()
print(result.get_data())
{'number': 2.0}

Modify the parameter and see the difference:

[9]:
calculator_plus.parameters["plus_times"] = 5
print(calculator_plus.backengine().get_data())
{'number': 6.0}

Define a Calculator with native file output

[10]:
from typing import Union
from pathlib import Path
import numpy as np
from libpyvinyl.BaseData import DataCollection
from libpyvinyl.BaseCalculator import BaseCalculator, CalculatorParameters


class MinusCalculator(BaseCalculator):
    def __init__(
        self,
        name: str,
        input: Union[DataCollection, list, NumberData],
        output_keys: Union[list, str] = ["minus_result"],
        output_data_types=[NumberData],
        output_filenames: Union[list, str] = ["minus_result.txt"],
        instrument_base_dir="./",
        calculator_base_dir="MinusCalculator",
        parameters=None,
    ):
        """A python object calculator example"""
        super().__init__(
            name,
            input,
            output_keys,
            output_data_types=output_data_types,
            output_filenames=output_filenames,
            instrument_base_dir=instrument_base_dir,
            calculator_base_dir=calculator_base_dir,
            parameters=parameters,
        )

    def init_parameters(self):
        parameters = CalculatorParameters()
        times = parameters.new_parameter(
            "minus_times", comment="How many times to do the minus"
        )
        times.value = 1
        self.parameters = parameters

    def backengine(self):
        Path(self.base_dir).mkdir(parents=True, exist_ok=True)
        input_num0 = self.input.to_list()[0].get_data()["number"]
        input_num1 = self.input.to_list()[1].get_data()["number"]
        output_num = float(input_num0) - float(input_num1)
        if self.parameters["minus_times"].value > 1:
            for i in range(self.parameters["minus_times"].value - 1):
                output_num -= input_num1
        arr = np.array([output_num])
        file_path = self.output_file_paths[0]
        np.savetxt(file_path, arr, fmt="%.3f")
        key = self.output_keys[0]
        output_data = self.output[key]
        output_data.set_file(file_path, TXTFormat)
        return self.output

MinusCalculator is the similar to PlusCalculator except its output_data is a NumberData mapping to TXTFormat instead of python object.

The simulation results can be obtained in the same way as that of PlusCalculator

[11]:
input1 = NumberData.from_dict({"number": 5}, "input1")
input2 = NumberData.from_dict({"number": 1}, "input2")
calculator_minus = MinusCalculator(name="test",input=[input1,input2])
output = calculator_minus.backengine()
print(output.get_data())
{'number': 4.0}

We can see that output is now mapping to a file :

[12]:
print(output)
Data collection:
key - mapping

minus_result - <class '__main__.TXTFormat'>: MinusCalculator/minus_result.txt

If we read the file, we should get the same result.

[13]:
print(output["minus_result"].filename)
with open(output["minus_result"].filename,'r') as fh:
    print(fh.read())
MinusCalculator/minus_result.txt
4.000

Define an instrument

We can assmeble a single PlusMinus instrument from the two Calculators to sum input1 and input2 and then subtract the result with input2:

[16]:
from libpyvinyl import Instrument

# Create an Instrument with the name PlusMinus
calculation_instrument = Instrument("PlusMinus")

# Create python object data as input
input1 = NumberData.from_dict({"number": 1}, "input1")
input2 = NumberData.from_dict({"number": 2}, "input2")
calculator_plus = PlusCalculator(name="Plus",input=[input1,input2])
# The the output of calculator_plus as the input of calculator_minus
calculator_minus = MinusCalculator(name="Minus",input=[calculator_plus.output["plus_result"],input2])

# Assemble the instrument
calculation_instrument.add_calculator(calculator_plus)
calculation_instrument.add_calculator(calculator_minus)

# Set the base output path of the instrument
instrument_path = "PlusMinus"
calculation_instrument.set_instrument_base_dir(str(instrument_path))

Run the instrument

[17]:
# 1+2-2 = 1
calculation_instrument.run()
calculation_instrument.calculators['Minus'].output.get_data()
[17]:
{'number': 1.0}