SUMMARY

We implemented the High Order Convolution Module based on PyTorch and optimized its performance using PyTorch’s C++ and CUDA extension. We experimented our implementations with NVIDIA Tesla K80. Our implementation with PyTorch’s C++ extension is about 2x faster for forward and more than 10x faster for backward compared to basic Python implementation on GPU. And our implementation with PyTorch’s CUDA extension is more than 20x faster for forward and 100x faster for backward compared to basic Python implementation on GPU.

PROPOSAL

Proposal

CHECKPOINT

Checkpoint

FINAL REPORT

Final

CODE

python/ folder contains the original high order convolution model built on PyTorch extension (nn.Module)

cpp/ folder contains a python wrapper and uses c++ to implement the critical part of the high order convolution model

cuda/ folder contains a python wrapper and a c++ wrapper and uses CUDA to implement the critical part of the high order convolution model