0x00 瞎扯淡 - 什么是卷积

• 难懂之因：为了数学美，拆卸了脚手架。教科书书常用“定义—定理”的体系，先给出数学定义，然后给出若干性质，从公式到公式，逐步推导。有的教科书采用用信号“反褶、平移、相乘、积分”给出几何解释，属于用数学解释数学，提问者不满足这种解释。这不是当年发明卷积的大师们的“需求–猜想—发现—证明—应用”的路径，大师们建设好“卷积”大厦后，为了数学美，拆卸了脚手架，现在人们看到的是炼成的钢铁，看不出钢铁是怎样炼成的。造成了部分非数学专业学生的一个难点。——唐常杰

1. 给卷积一个通俗的解释——关于卷积的一个血腥的讲解

因为关于卷积的内容太血腥，于是我把它移到了附录里面。按几下空格或者PageDown即可观看

0x02 卷积神经网络历史

2. LeNet

LeNet的基本原理就是对输入图像一层一层的卷积，最后接一个softmax把结果分为10类。论文实在是太长了，有兴趣的同鞋可以看看。

4. VGG

当年VGG运气不好，碰到了GoogLeNet，否则就是比赛第一名了。

7. DenseNet

ResNet 虽然效果了不得，但速度确实有点慢。ResNet的思想是把一层的输出跨几层传递到后面，DenseNet的思路更简单粗暴，把所有的层都连接起来了（在连接处的区别为ResNet是求和，DenseNet是concat）,实验效果超好。

8. 总结

• AlexNet: The model that started a revolution! The original model was crazy with the split GPU thing so this is the model from some follow-up work.

• Darknet Reference Model: This model is designed to be small but powerful. It attains the same top-1 and top-5 performance as AlexNet but with 1/10th the parameters. It uses mostly convolutional layers without the large fully connected layers at the end. It is about twice as fast as AlexNet on CPU making it more suitable for some vision applications.

• VGG-16: The Visual Geometry Group at Oxford developed the VGG-16 model for the ILSVRC-2014 competition. It is highly accurate and widely used for classification and detection. I adapted this version from the Caffe pre-trained model. It was trained for an additional 6 epochs to adjust to Darknet-specific image preprocessing (instead of mean subtraction Darknet adjusts images to fall between -1 and 1).

• Extraction: I developed this model as an offshoot of the GoogleNet model. It doesn’t use the “inception” modules, only 1x1 and 3x3 convolutional layers.

• Darknet19: I modified the Extraction network to be faster and more accurate. This network was sort of a merging of ideas from the Darknet Reference network and Extraction as well as numerous publications like Network In Network, Inception, and Batch Normalization.

• Darknet19 448x448: I trained Darknet19 for 10 more epochs with a larger input image size, 448x448. This model performs significantly better but is slower since the whole image is larger.

• Resnet 50: For some reason people love these networks even though they are so sloooooow. Whatever.

• Resnet 152: For some reason people love these networks even though they are so sloooooow. Whatever.

• Densenet 201: I love DenseNets! They are just so deep and so crazy and work so well. Like Resnet, still slow since they are sooooo many layers but at least they work really well!

0xFF 附录 瞎扯淡 - 什么是卷积

接着第0章的卷积继续说

1. 给卷积一个通俗的解释——关于卷积的一个血腥的讲解

• 时不变系统：就是系统的参数不随时间而变化，即不管输入信号作用的时间先后，输出信号响应的形状均相同。就是说不管老板什么时候打你，肿胀发展的程度是一样的
• 线性时不变系统：在时不变系统中满足叠加原理。即老板在t时刻打你N巴掌，那么肿胀的程度成N倍增加。

2. 物理意义

$h_1(t) = f(1) * g(t) = 1 * [2, 3, 1]$ 然后向右平移1个单位

$h_2(t) = f(2) * g(t) = = 3 * [2, 3, 1] = [6, 9, 3]$ 然后向右平移2个单位

• 说白了，卷积就是平移、叠加。（从这里，可以看到卷积的重要的物理意义是：一个函数（如：单位响应）在另一个函数（如：输入信号）上的加权叠加。）

3. 更有意思的来了

$(3x^2+x+2)*(x^2+3x+2)$
$=(3x^4+9x^3+6x^2)+(x^3+3x^2+2x)+(2x^2+6x+4)$
$=3x^4+10x^3+11x^2+8x+4$

1. 我们把第一个多项式反过来，使其一个降幂排列，一个升幂排列。公式变成了$(2+x+3x^2)*(x^2+3x+2)$
2. 平移：把第二个多项式每次向右平移一项
3. 相乘：竖直对齐的项分别相乘
4. 求和：相乘的结果相加