RISC V in Spacer

RISC-V IN SPACE
14.12.2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Klepsydra Technologies

COMPARE AND SWAP
• Compare-and-swap (CAS) is an instruction
used in multithreading to achieve
synchronisation. It compares the contents of
a memory location with a given value and,
only if they are the same, modi
fi
es the
contents of that memory location to a new
given value. This is done as a single
atomic operation.
• Compare-and-Swap has been an integral
part of the IBM 370 architectures since
1970.
• Maurice Herlihy (1991) proved that CAS can
implement more of these algorithms than
atomic read, write, and fetch-and-add

Event Loop
Sensor Multiplexer
Two main data
processing approaches
Producer 1
Consumer 1 Consumer 2
Producer 2
Producer 3
Consumer
Producer 1
4

Lightweight, modular and compatible with most used operating systems
Worldwide
application
Klepsydra
SDK
Klepsydra
GPU
Streaming
Klepsydra
AI
Klepsydra
ROS2
executor
plugin
SDK – Software Development Kit
Boost data processing at the edge for general
applications and processor intensive
algorithms
AI – Artificial Intelligence
High performance deep neural network
(DNN) engine to deploy any AI or
machine learning module at the edge
ROS2 Executor plugin
Executor for ROS2 able to process up
to 10 times more data with up to 50%
reduction in CPU consumption.
GPU (Graphic Processing Unit)
High parallelisation of GPU to increase
the processing data rate and GPU
utilization
THE PRODUCT

LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline

2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
Layer Layer Layer Layer Layer
Deep Neural Network Structure

Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)

Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration

ONNX API
class KPSR_API OnnxDNNImporter
{
public:
/**
* @brief import an onnx file and uses a default eventloop factory for all processor cores
* @param onnxFileName
* @param testDNN
* @return a share pointer to a DeepNeuralNetwork object
*
* When log level is debug, dumps the YAML configuration of the default factory.
* It makes use of all processor cores.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName,
bool testDNN = false);
/**
* @brief importForTest an onnx file and uses a default synchronous factory
* @param onnxFileName
* @param envFileName. Klepsydra AI configuration environment file.
* @return a share pointer to a DeepNeuralNetwork object
*
* This method is intented to be used for testing purposes only.
*
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName,
const std::string & envFileName);
};
10

Core API
class DeepNeuralNetwork {
public:
/**
* @brief setCallback
* @param callback. Callback function for the prediction result.
*/
virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0;
/**
* @brief predict. Load input matrix as input to network.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0;
/**
* @brief predict. Copy-less version of predict.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0;
};
11

KLEPSYDRA SDO PROCESS
• Klepsydra Streaming Distribution Optimiser (SDO):
• Runs on a separate computer
• Executes several dry runs on the OBC
• Collect statistics
• Runs a genetic algorithm to
fi
nd the optimal
solution for latency, power or throughput
• The main variable to optimise is the distribution of
layers are the two dimension of the threading model

KLEPSYDRA STREAMING DISTRIBUTION
OPTIMISER (SDO)

QORIQ® LAYERSCAPE LS1046A
MULTICORE PROCESSOR
QorIQ® Layerscape LS1046A
Klepsydra AI Container

STATUS
• Successful installation of the following setup:
• LS1046 running Yocto Jethro
• Docker Installed on LS1046
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software fully supported (quantised and non-
quantised)

XILINX ZEDBOARD
ZedBoard
PetaLinux

STATUS
• Successful installation of the following setup:
• ZedBoard running PetaLinux 2019.2
• Docker Installed on ZedBoard
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software with quantised support only

PERFORMANCE RESULTS: CME ON LS1046
0
6,5
13
19,5
26
CPU / Hz
TFLite + NEON Klepsydra
0
45
90
135
180
Latency (ms)
0
4,5
9
13,5
18
Throughput (Hz)

PERFORMANCE RESULTS: CME-Q ON LS1046
0
6,75
13,5
20,25
27
CPU / Hz
0
30
60
90
120
Latency (ms)
0
7,5
15
22,5
30
Throughput (Hz)

PERFORMANCE RESULTS: CME-Q ON ZEDBOARD
0
12,5
25
37,5
50
CPU / Hz
0
250
500
750
1000
Latency (ms)
0
0,65
1,3
1,95
2,6
Throughput (Hz)

PERFORMANCE RESULTS: BSC ON LS1046
0
20
40
60
80
CPU / Hz
0
1250
2500
3750
5000
Latency (ms)
0
0,15
0,3
0,45
0,6
Throughput (Hz)

THE PATTERN PROJECT
PATTERN: Klepsydra AI ported to the GR740 aNd
RISC-V
• Target Processor: GR740, GR765 (Leon5 & Noel-V)
• Target OS: RTMES5
• Development on commercial FPGA board
• Validation on Space quali
fi
ed hardware

THE MULTI-THREADING API
Klepsydra SDK
Multi-threading framework
POSIX Operating System
PTHREAD
Klepsydra AI
Klepsydra SDK
Threading Abstraction Layer
POSIX
PTHREAD
Klepsydra AI
RTEMS5
RTEMS

THE PARALLELISATION FRAMEWORK
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
PTHREAD
Parallelisation Framework
Klepsydra AI
Back-ends
Parallelisation Framework
Threading Abstraction Layer
POSIX
RTEMS5
PTHREAD
RTEMS

THE MATHEMATICAL BACKEND
Klepsydra AI
Back-ends
ARM x86 ARM x86
Klepsydra AI
Back-ends
ARM x86 ARM x86
RISC-V?
RISC-V Extensions
• Current version of Klepsydra AI supports RV32GV and RV64GC
• Preparation for NOEL-V in three modes:
• ‘Vanilla’
• P-Extension and V-Extension,
• And more….

THE PLAN
Phase 1:
Klepsydra AI for RTEMS5
Phase 2:
Klepsydra AI for GR765/Leon5
Phase 3:
Klepsydra AI for GR765/Noel-V
Phase 4:
Validation of Klepsydra AI on
GR740 and GR765

THE SCHEDULE
Work Package Start Month End Month
Duration in
Months
0 1 2 3 4 5 6 7 8 9 10 11
KOM MTR1 MTR2 FR
WP4.2 11 11 1
WP0 0 17 18
2
11
10
WP4.1
1
10
10
WP3.3
0 0 1
WP2.1
WP2.2 1 4 4
WP1.1 5
0 4
5 8 4
WP1.2
WP2.3 9 10 2
WP3.2 5 8 4
WP3.1 5 5 1

CONCLUSIONS
• Enable real AI for future missions on the GR765/NOEL-V
• Very easy to use, via a simple API and web-based
optimisation tool
• Highly optimised for the GR765/NOEL-V processors
• Lightweight software (current version is 4Mb)
• Deterministic and full control of the dedicated resources

NEXT STEPS
• In-orbit-demonstration:
• OPSSAT OBC: Using Onboard Altera FPGA and NOEL-V
softcore
• Other?
• Health Monitoring (core operation failures, etc).

CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

RISC V in Spacer

More Related Content

Similar to RISC V in Spacer (20)

More from klepsydratechnologie (8)

Recently uploaded (20)

RISC V in Spacer