1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217
//! Provides a simple and unified API to run fast and highly parallel computations on different //! devices such as CPUs and GPUs, accross different computation languages such as OpenCL and //! CUDA and allows you to swap your backend on run-time. //! //! Collenchyma was started at [Autumn][autumn] to create an easy and performant abstraction over //! different backends for the Machine Intelligence Framework [Leaf][leaf], with no hard //! dependency on any driver or libraries so that it can easily be used without the need for a //! long and painful build process. //! //! ## Abstract //! //! Code often is executed on the native CPU, but could be executed on other devices such as GPUs //! and Accelerators as well. These devices are accessable through frameworks like OpenCL and CUDA //! but have a more complicated interfaces than your every-day native CPU //! which makes the use of these devices a painful experience. Some of the pain points, when //! writing such device code, are: //! //! * non-portable: frameworks have different interfaces, devices support different versions and //! machines might have different hardware - all this leads to code that will be executable only on //! a very specific set of machines and platforms. //! * steep learning curve: executing code on a device through a framework is quite different to //! running code on the native CPU and comes with a lot of hurdles. OpenCLs 1.2 specification for //! example has close to 400 pages. //! * custom code: integrating support for devices into your project, requires the need for writing //! a lot of custom code e.g. kernels, memory management, genereal business logic. //! //! But writing code for devices would often be a good choice as these devices can execute many //! operations a lot faster than the native CPUs. GPUs for example can execute operations roughly //! one to two orders of magnitudes faster, thanks to better support of parallising operations. //! OpenCL and CUDA make parallising operations super easy. //! //! With Collenchyma we eleminate the pain points of writing device code, so you can run your code //! like any other Rust code, don't need to learn about kernels, events, or memory //! synchronization, and can deploy your code with ease to servers, desktops or mobiles and //! your code will make full use of the underlying hardware. //! //! ## Architecture //! //! The single entry point of Collenchyma is a [Backend][backend]. A Backend is agnostic over the [Device][device] it //! runs [Operations][operation] on. In order to be agnostic over the Device, such as native host CPU, GPUs, //! Accelerators or other types of [Hardware][hardware], the Backend needs to be agnostic over the //! [Framework][framework] as well. A Framework is a computation language such as OpenCL, Cuda or the native programming //! language. The Framework is important, as it provides us with the interface to turn Hardware into Devices and //! therefore, among other things, execute Operations on the created Device. With a Framework, we get access to Hardware //! as long as the Hardware supports the Framework. As different vendors of Hardware use different //! Frameworks, it becomes important that the Backend is agnostic over the Framework, which allows us, that we can //! really run computations on any machine such as servers, desktops and mobiles without the need to worry about what //! Hardware is available on the machine. That gives us the freedom to write code once and deploy it on different //! machines where it will execute on the most potent Hardware by default. //! //! Operations get introduced by a [Plugin][plugin]. A Plugin extends your Backend with ready-to-execute Operations. //! All you need to do is, providing these Collenchyma Plugin crates alongside the Collenchyma crate in your Cargo //! file. Your Backend will than be extend with the operations provided by the Plugin. The interface is just common //! Rust e.g. to execute the dot product operation of the [Collenchyma-BLAS][collenchyma-blas] Plugin, //! we can simply call `backend.dot(...)`. If the dot Operation is executed on e.g. //! one or many GPUs or CPUs depends solely on how you configured the Backend or if you did not further specify which //! Framework and Hardware to use, solely on the machine you execute the dot Operation on. In the field of Operations //! is one more component - the [Binary][binary]. As - different to executing code on the native CPU - devices need //! to compile and build the Operation manually at run-time, which makes a significant part of a Framework, we need //! an initlizable instance for holding the state and compiled Operations, wich the Binary is good for. //! //! The last peace of Collenchyma is the [Memory][memory]. A Operation happens over data, but this data needs to be //! accessable by the device on which the Operation is executed. The process is therefore often, that memory space needs //! to be allocated on the device and then in a later step, synced from the host to the device or from //! the device back to the host. Thanks to the [Tensor][tensor] we do not have to care about memory management //! between devices for the execution of Operations. Tensor tracks and automatically manages data and it's memory //! accross devices, which is often the host and the Device. But it can also be passed around to different Backends. //! Operations take as arguments Tensors and handle the synchronization and allocation for you. //! //! ## Examples //! //! This example requires the Collenchyma NN Plugin, for Neural Network related operations, to work. //! //! ```ignore //! extern crate collenchyma as co; //! extern crate collenchyma_nn as nn; //! use co::prelude::*; //! use nn::*; //! //! fn write_to_memory<T: Copy>(mem: &mut MemoryType, data: &[T]) { //! if let &mut MemoryType::Native(ref mut mem) = mem { //! let mut mem_buffer = mem.as_mut_slice::<T>(); //! for (index, datum) in data.iter().enumerate() { //! mem_buffer[index] = *datum; //! } //! } //! } //! //! fn main() { //! // Initialize a CUDA Backend. //! let backend = Backend::<Cuda>::default().unwrap(); //! // Initialize two SharedTensors. //! let mut x = SharedTensor::<f32>::new(backend.device(), &(1, 1, 3)).unwrap(); //! let mut result = SharedTensor::<f32>::new(backend.device(), &(1, 1, 3)).unwrap(); //! // Fill `x` with some data. //! let payload: &[f32] = &::std::iter::repeat(1f32).take(x.capacity()).collect::<Vec<f32>>(); //! let native = Backend::<Native>::default().unwrap(); //! x.add_device(native.device()).unwrap(); // Add native host memory //! x.sync(native.device()).unwrap(); // Sync to native host memory //! write_to_memory(x.get_mut(native.device()).unwrap(), payload); // Write to native host memory. //! x.sync(backend.device()).unwrap(); // Sync the data to the CUDA device. //! // Run the sigmoid operation, provided by the NN Plugin, on your CUDA enabled GPU. //! backend.sigmoid(&mut x, &mut result).unwrap(); //! // See the result. //! result.add_device(native.device()).unwrap(); // Add native host memory //! result.sync(native.device()).unwrap(); // Sync the result to host memory. //! println!("{:?}", result.get(native.device()).unwrap().as_native().unwrap().as_slice::<f32>()); //! } //! ``` //! //! ## Development //! //! At the moment Collenchyma itself will provide Rust APIs for the important frameworks - OpenCL //! and CUDA. One step we are looking out for is to seperate OpenCL and CUDA into their own crate. //! Something similar to [Glium][glium]. //! //! Every operation exposed via a Plugin and implemented on the backend, should take as the last argument an //! `Option<OperationConfig>` to specify custom parallelisation behaviour and tracking the operation via events. //! //! When initializing a new Backend from a BackendConfig you might not want to specify the Framework, which is currently //! mandatory. Leaving it blank, the Backend would try to use the most potent Framework given the underlying hardware, //! which would be probably in this order Cuda -> OpenCL -> Native. The setup might take longer, as every framework //! needs to be checked, and devices be loaded in order to identify the best setup. But this would allow, that you //! really could deploy a Collenchyma-backed application to almost any hardware - server, desktops, mobiles. //! //! [autumn]: http://autumnai.com //! [leaf]: https://github.com/autumnai/leaf //! [glium]: https://github.com/tomaka/glium //! [backend]: ./backend/index.html //! [device]: ./device/index.html //! [binary]: ./binary/index.html //! [operation]: ./operation/index.html //! [hardware]: ./hardware/index.html //! [framework]: ./framework/index.html //! [plugin]: ./plugin/index.html //! [collenchyma-blas]: https://github.com/autumnai/collenchyma-blas //! [memory]: ./memory/index.html //! [tensor]: ./tensor/index.html #![cfg_attr(lint, feature(plugin))] #![cfg_attr(lint, plugin(clippy))] #![allow(dead_code)] #![deny(missing_docs, missing_debug_implementations, missing_copy_implementations, trivial_casts, trivial_numeric_casts, unused_import_braces, unused_qualifications)] #![cfg_attr(feature = "unstable_alloc", feature(alloc))] #[cfg(feature = "unstable_alloc")] extern crate alloc; extern crate libc; #[macro_use] extern crate bitflags; #[macro_use] extern crate enum_primitive; #[macro_use] extern crate lazy_static; extern crate num; extern crate byteorder; extern crate linear_map; pub mod backend; pub mod device; pub mod hardware; pub mod framework; pub mod frameworks; pub mod memory; pub mod tensor; pub mod operation; pub mod binary; pub mod error; pub mod plugin; // These will be exported with the prelude. pub use backend::*; pub use device::{IDevice, DeviceType}; pub use hardware::{IHardware, HardwareType}; pub use framework::IFramework; pub use memory::{IMemory, MemoryType}; pub use tensor::{SharedTensor, TensorDesc, ITensorDesc, IntoTensorDesc}; #[cfg(feature = "native")] pub use frameworks::Native; #[cfg(feature = "cuda")] pub use frameworks::Cuda; #[cfg(feature = "opencl")] pub use frameworks::OpenCL; // These should only be imported with caution, since they are likely // to create a namespace collision. pub use error::Error; /// A module meant to be glob imported when using Collenchyma. /// /// For instance: /// /// ``` /// use collenchyma::prelude::*; /// ``` /// /// This module contains several important traits that provide many /// of the convenience methods in Collenchyma, as well as most important types. /// Another type that is often needed but is likely to cause a name collision /// when imported is `collenchyma::Error`. pub mod prelude { pub use backend::*; pub use device::{IDevice, DeviceType}; pub use hardware::{IHardware, HardwareType}; pub use framework::IFramework; pub use memory::{IMemory, MemoryType}; pub use tensor::{SharedTensor, TensorDesc, ITensorDesc, IntoTensorDesc}; #[cfg(feature = "native")] pub use frameworks::Native; #[cfg(feature = "cuda")] pub use frameworks::Cuda; #[cfg(feature = "opencl")] pub use frameworks::OpenCL; }