Convolutional Neural Fabrics (CNFs) are a construction algorithm for CNN architectures.
Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern.

- Pooling: CNFs don't use pooling. However, this might not be necessary as they use strided convolution.
- Filter size: All convolutions use kernel size 3.
- Output layer: Scale $1 \times 1$, channels = nr of classes
- Activation function: Rectified linear units (ReLUs) are used at all nodes.
Evaluation
- Part Labels dataset (face images from the LFW dataset): a super-pixel accuracy of 95.6%
- MNIST: 0.33% error (see SotA; 0.21 %)
- CIFAR10: 7.43% error (see SotA; 2.72 %)
What I didn't understand
- "Activations are thus a linear function over multi-dimensional neighborhoods, i.e. a four dimensional
3×3×3×3 neighborhood when processing 2D images"
- "within the first layer, channel c at scale s receives input from channels c + {−1, 0, 1} from scale s − 1": Why does the scale change? Why doesn't the first layer receive input from the same scale?