Skip to content →

Category: Data Analysis

Notes to myself: RTX 2070, Cuda, cudnn, caffe, and faceswap

Install NVIDIA driver for RTX 2070: https://www.geforce.com/drivers

sudo bash ~/Downloads/NVIDIA-Linux-x86_64-415.27.run

Install CUDA 10.0: https://developer.nvidia.com/cuda-downloads

sudo bash ~/Downloads/cuda_10.0.130_410.48_linux.run

DO NOT re-install the drivers suggested by the CUDA installer:
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27       Driver Version: 415.27       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:1F:00.0 Off |                  N/A |
| 36%   31C    P0    N/A /  N/A |      0MiB /  7949MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

CuDNN:

cuDNN Runtime Library for Ubuntu18.04 (Deb)

cuDNN Developer Library for Ubuntu18.04 (Deb)

Caffe:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
linked by target “caffe” in directory /home/dli/Projects/caffe/src/caffe

Upgrade cmake:

$ ~/cmake-3.13.3-Linux-x86_64/bin/cmake -DCMAKE_INSTALL_PREFIX:PATH=/usr ..
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   system
--   thread
--   filesystem
--   chrono
--   date_time
--   atomic
-- Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
-- Found glog    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
-- Found PROTOBUF Compiler: /usr/bin/protoc
-- HDF5: Using hdf5 compiler wrapper to determine C configuration
-- HDF5: Using hdf5 compiler wrapper to determine CXX configuration
-- Found lmdb    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
-- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
-- Found Snappy  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libsnappy.so)
-- CUDA detected: 10.0
-- Found cuDNN: ver. 7.4.2 found (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Added CUDA NVCC flags for: sm_75
-- OpenCV found (/usr/share/OpenCV)
-- Found Atlas (include: /usr/include/x86_64-linux-gnu library: /usr/lib/x86_64-linux-gnu/libatlas.so lapack: /usr/lib/x86_64-linux-gnu/liblapack.so
-- NumPy ver. 1.16.0 found (include: /usr/local/lib/python2.7/dist-packages/numpy/core/include)
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   python
-- Detected Doxygen OUTPUT_DIRECTORY: ./doxygen/
--
-- ******************* Caffe Configuration Summary *******************
-- General:
--   Version           :   1.0.0
--   Git               :   1.0-132-g99bd9979
--   System            :   Linux
--   C++ compiler      :   /usr/bin/c++
--   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
--   Debug CXX flags   :   -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
--   Build type        :   Release
--
--   BUILD_SHARED_LIBS :   ON
--   BUILD_python      :   ON
--   BUILD_matlab      :   OFF
--   BUILD_docs        :   ON
--   CPU_ONLY          :   OFF
--   USE_OPENCV        :   ON
--   USE_LEVELDB       :   ON
--   USE_LMDB          :   ON
--   USE_NCCL          :   OFF
--   ALLOW_LMDB_NOLOCK :   OFF
--   USE_HDF5          :   ON
--
-- Dependencies:
--   BLAS              :   Yes (Atlas)
--   Boost             :   Yes (ver. 1.65)
--   glog              :   Yes
--   gflags            :   Yes
--   protobuf          :   Yes (ver. 3.0.0)
--   lmdb              :   Yes (ver. 0.9.21)
--   LevelDB           :   Yes (ver. 1.20)
--   Snappy            :   Yes (ver. ..)
--   OpenCV            :   Yes (ver. 3.2.0)
--   CUDA              :   Yes (ver. 10.0)
--
-- NVIDIA CUDA:
--   Target GPU(s)     :   Auto
--   GPU arch(s)       :   sm_75
--   cuDNN             :   Yes (ver. 7.4.2)
--
-- Python:
--   Interpreter       :   /usr/bin/python2.7 (ver. 2.7.15)
--   Libraries         :   /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.15rc1)
--   NumPy             :   /usr/local/lib/python2.7/dist-packages/numpy/core/include (ver 1.16.0)
--
-- Documentaion:
--   Doxygen           :   /usr/bin/doxygen (1.8.13)
--   config_file       :   /home/dli/Projects/caffe/.Doxyfile
--
-- Install:
--   Install path      :   /usr
--
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dli/Projects/caffe/build
$ make -j16 all
$ make pycaffe
$ sudo make install
[  2%] Built target caffeproto
[ 87%] Built target caffe
[ 89%] Built target upgrade_solver_proto_text
[ 90%] Built target compute_image_mean
[ 90%] Built target caffe.bin
[ 91%] Built target upgrade_net_proto_binary
[ 93%] Built target convert_imageset
[ 94%] Built target extract_features
[ 95%] Built target upgrade_net_proto_text
[ 95%] Built target classification
[ 95%] Built target convert_mnist_data
[ 97%] Built target convert_cifar_data
[ 98%] Built target convert_mnist_siamese_data
[100%] Built target pycaffe
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/share/Caffe/CaffeConfig.cmake
-- Installing: /usr/share/Caffe/CaffeTargets.cmake
-- Installing: /usr/share/Caffe/CaffeTargets-release.cmake
-- Installing: /usr/include/caffe
-- Installing: /usr/include/caffe/test
-- Installing: /usr/include/caffe/test/test_gradient_check_util.hpp
-- Installing: /usr/include/caffe/test/test_caffe_main.hpp
-- Installing: /usr/include/caffe/layers
-- Installing: /usr/include/caffe/layers/cudnn_tanh_layer.hpp
-- Installing: /usr/include/caffe/layers/absval_layer.hpp
-- Installing: /usr/include/caffe/layers/multinomial_logistic_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/dummy_data_layer.hpp
-- Installing: /usr/include/caffe/layers/recurrent_layer.hpp
-- Installing: /usr/include/caffe/layers/scale_layer.hpp
-- Installing: /usr/include/caffe/layers/hdf5_data_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_sigmoid_layer.hpp
-- Installing: /usr/include/caffe/layers/clip_layer.hpp
-- Installing: /usr/include/caffe/layers/hinge_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/mvn_layer.hpp
-- Installing: /usr/include/caffe/layers/relu_layer.hpp
-- Installing: /usr/include/caffe/layers/hdf5_output_layer.hpp
-- Installing: /usr/include/caffe/layers/contrastive_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/lrn_layer.hpp
-- Installing: /usr/include/caffe/layers/accuracy_layer.hpp
-- Installing: /usr/include/caffe/layers/conv_layer.hpp
-- Installing: /usr/include/caffe/layers/infogain_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/im2col_layer.hpp
-- Installing: /usr/include/caffe/layers/base_conv_layer.hpp
-- Installing: /usr/include/caffe/layers/euclidean_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/crop_layer.hpp
-- Installing: /usr/include/caffe/layers/window_data_layer.hpp
-- Installing: /usr/include/caffe/layers/bnll_layer.hpp
-- Installing: /usr/include/caffe/layers/eltwise_layer.hpp
-- Installing: /usr/include/caffe/layers/prelu_layer.hpp
-- Installing: /usr/include/caffe/layers/filter_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_lcn_layer.hpp
-- Installing: /usr/include/caffe/layers/reduction_layer.hpp
-- Installing: /usr/include/caffe/layers/sigmoid_cross_entropy_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/swish_layer.hpp
-- Installing: /usr/include/caffe/layers/slice_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_softmax_layer.hpp
-- Installing: /usr/include/caffe/layers/reshape_layer.hpp
-- Installing: /usr/include/caffe/layers/silence_layer.hpp
-- Installing: /usr/include/caffe/layers/sigmoid_layer.hpp
-- Installing: /usr/include/caffe/layers/power_layer.hpp
-- Installing: /usr/include/caffe/layers/spp_layer.hpp
-- Installing: /usr/include/caffe/layers/exp_layer.hpp
-- Installing: /usr/include/caffe/layers/pooling_layer.hpp
-- Installing: /usr/include/caffe/layers/input_layer.hpp
-- Installing: /usr/include/caffe/layers/data_layer.hpp
-- Installing: /usr/include/caffe/layers/lstm_layer.hpp
-- Installing: /usr/include/caffe/layers/neuron_layer.hpp                                                                                                                                     
-- Installing: /usr/include/caffe/layers/split_layer.hpp
-- Installing: /usr/include/caffe/layers/threshold_layer.hpp
-- Installing: /usr/include/caffe/layers/base_data_layer.hpp
-- Installing: /usr/include/caffe/layers/log_layer.hpp
-- Installing: /usr/include/caffe/layers/loss_layer.hpp
-- Installing: /usr/include/caffe/layers/rnn_layer.hpp
-- Installing: /usr/include/caffe/layers/elu_layer.hpp
-- Installing: /usr/include/caffe/layers/memory_data_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_relu_layer.hpp
-- Installing: /usr/include/caffe/layers/tanh_layer.hpp
-- Installing: /usr/include/caffe/layers/flatten_layer.hpp
-- Installing: /usr/include/caffe/layers/dropout_layer.hpp
-- Installing: /usr/include/caffe/layers/bias_layer.hpp
-- Installing: /usr/include/caffe/layers/softmax_loss_layer.hpp
-- Installing: /usr/include/caffe/layers/deconv_layer.hpp
-- Installing: /usr/include/caffe/layers/inner_product_layer.hpp
-- Installing: /usr/include/caffe/layers/batch_norm_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_conv_layer.hpp
-- Installing: /usr/include/caffe/layers/parameter_layer.hpp
-- Installing: /usr/include/caffe/layers/tile_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_deconv_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_pooling_layer.hpp
-- Installing: /usr/include/caffe/layers/python_layer.hpp
-- Installing: /usr/include/caffe/layers/embed_layer.hpp
-- Installing: /usr/include/caffe/layers/image_data_layer.hpp
-- Installing: /usr/include/caffe/layers/batch_reindex_layer.hpp
-- Installing: /usr/include/caffe/layers/softmax_layer.hpp
-- Installing: /usr/include/caffe/layers/concat_layer.hpp
-- Installing: /usr/include/caffe/layers/cudnn_lrn_layer.hpp
-- Installing: /usr/include/caffe/layers/argmax_layer.hpp
-- Installing: /usr/include/caffe/blob.hpp
-- Installing: /usr/include/caffe/util
-- Installing: /usr/include/caffe/util/upgrade_proto.hpp
-- Installing: /usr/include/caffe/util/blocking_queue.hpp
-- Installing: /usr/include/caffe/util/db.hpp
-- Installing: /usr/include/caffe/util/db_lmdb.hpp
-- Installing: /usr/include/caffe/util/insert_splits.hpp
-- Installing: /usr/include/caffe/util/format.hpp
-- Installing: /usr/include/caffe/util/im2col.hpp
-- Installing: /usr/include/caffe/util/gpu_util.cuh
-- Installing: /usr/include/caffe/util/signal_handler.h
-- Installing: /usr/include/caffe/util/cudnn.hpp
-- Installing: /usr/include/caffe/util/hdf5.hpp
-- Installing: /usr/include/caffe/util/io.hpp
-- Installing: /usr/include/caffe/util/nccl.hpp
-- Installing: /usr/include/caffe/util/rng.hpp
-- Installing: /usr/include/caffe/util/db_leveldb.hpp
-- Installing: /usr/include/caffe/util/mkl_alternate.hpp
-- Installing: /usr/include/caffe/util/benchmark.hpp
-- Installing: /usr/include/caffe/util/device_alternate.hpp
-- Installing: /usr/include/caffe/util/math_functions.hpp
-- Installing: /usr/include/caffe/sgd_solvers.hpp
-- Installing: /usr/include/caffe/layer.hpp
-- Installing: /usr/include/caffe/net.hpp
-- Installing: /usr/include/caffe/syncedmem.hpp
-- Installing: /usr/include/caffe/solver.hpp
-- Installing: /usr/include/caffe/filler.hpp
-- Installing: /usr/include/caffe/layer_factory.hpp                                                                                                                                            [0/863]
-- Installing: /usr/include/caffe/data_transformer.hpp
-- Installing: /usr/include/caffe/solver_factory.hpp
-- Installing: /usr/include/caffe/parallel.hpp
-- Installing: /usr/include/caffe/common.hpp
-- Installing: /usr/include/caffe/internal_thread.hpp
-- Installing: /usr/include/caffe/caffe.hpp
-- Installing: /usr/include/caffe/proto/caffe.pb.h
-- Installing: /usr/lib/x86_64-linux-gnu/libcaffe.so.1.0.0
-- Set runtime path of "/usr/lib/x86_64-linux-gnu/libcaffe.so.1.0.0" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/lib/x86_64-linux-gnu/libcaffe.so
-- Installing: /usr/lib/x86_64-linux-gnu/libcaffeproto.a
-- Installing: /usr/python/caffe/proto/caffe_pb2.py
-- Installing: /usr/python/caffe/proto/__init__.py
-- Installing: /usr/bin/caffe
-- Set runtime path of "/usr/bin/caffe" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/compute_image_mean
-- Set runtime path of "/usr/bin/compute_image_mean" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/convert_imageset
-- Set runtime path of "/usr/bin/convert_imageset" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/extract_features
-- Set runtime path of "/usr/bin/extract_features" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/upgrade_net_proto_binary
-- Set runtime path of "/usr/bin/upgrade_net_proto_binary" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/upgrade_net_proto_text
-- Set runtime path of "/usr/bin/upgrade_net_proto_text" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/upgrade_solver_proto_text
-- Set runtime path of "/usr/bin/upgrade_solver_proto_text" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/convert_cifar_data
-- Set runtime path of "/usr/bin/convert_cifar_data" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/classification
-- Set runtime path of "/usr/bin/classification" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/convert_mnist_data
-- Set runtime path of "/usr/bin/convert_mnist_data" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/bin/convert_mnist_siamese_data
-- Set runtime path of "/usr/bin/convert_mnist_siamese_data" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"
-- Installing: /usr/python/classify.py
-- Installing: /usr/python/detect.py
-- Installing: /usr/python/draw_net.py
-- Installing: /usr/python/requirements.txt
-- Installing: /usr/python/train.py
-- Up-to-date: /usr/python/caffe
-- Installing: /usr/python/caffe/io.py
-- Installing: /usr/python/caffe/pycaffe.py
-- Up-to-date: /usr/python/caffe/proto
-- Installing: /usr/python/caffe/proto/caffe_pb2.py
-- Installing: /usr/python/caffe/proto/__init__.py
-- Installing: /usr/python/caffe/coord_map.py
-- Installing: /usr/python/caffe/classifier.py
-- Installing: /usr/python/caffe/__init__.py
-- Installing: /usr/python/caffe/imagenet
-- Installing: /usr/python/caffe/imagenet/ilsvrc_2012_mean.npy
-- Installing: /usr/python/caffe/net_spec.py
-- Installing: /usr/python/caffe/detector.py
-- Installing: /usr/python/caffe/draw.py
-- Installing: /usr/python/caffe/_caffe.so
-- Set runtime path of "/usr/python/caffe/_caffe.so" to "/usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/hdf5/serial:/usr/local/cuda-10.0/lib64"

face_swap

$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index cb34534..762363f 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -69,21 +69,30 @@ find_package(OpenCV REQUIRED highgui imgproc imgcodecs calib3d photo)
 # dlib
 find_package(dlib REQUIRED)

+if(DEFINED CAFFE_DIR)
+    list(APPEND CMAKE_PREFIX_PATH ${CAFFE_DIR})
+endif()
+
 # Caffe
 find_package(Caffe REQUIRED)

+if(DEFINED HDF5_DIR)
+    list(APPEND CMAKE_MODULE_PATH ${HDF5_DIR})
+endif()
+
 # HDF5
 if(MSVC)
   # Find HDF5 using it's hdf5-config.cmake file with MSVC
   if(DEFINED HDF5_DIR)
     list(APPEND CMAKE_MODULE_PATH ${HDF5_DIR})
   endif()
-  find_package(HDF5 COMPONENTS C HL CXX REQUIRED)
   set(HDF5_LIBRARIES hdf5-shared hdf5_cpp-shared)
   set(HDF5_HL_LIBRARIES hdf5_hl-shared)
 else()
   #find_package(HDF5 COMPONENTS HL REQUIRED)
+  # find_package(HDF5 COMPONENTS C CXX HL REQUIRED)
   find_package(HDF5 COMPONENTS C CXX HL REQUIRED)
+  # find_package(HDF5 COMPONENTS C HL CXX REQUIRED PATHS /usr/lib/x86_64-linux-gnu/hdf5/serial)
 endif()
 #find_package(HDF5 REQUIRED CXX)

@@ -132,23 +141,23 @@ endif()
 set(FACE_SWAP_TARGETS face_swap iris_sfs)
 export(TARGETS ${FACE_SWAP_TARGETS}
   FILE "${PROJECT_BINARY_DIR}/face_swap-targets.cmake")
-
+
 # Export the package for use from the build-tree
 # (this registers the build-tree with a global CMake-registry)
 export(PACKAGE face_swap)
-
+
 # Create config files
 configure_file(cmake/face_swap-config.cmake.in
   "${PROJECT_BINARY_DIR}/face_swap-config.cmake" @ONLY)
 configure_file(cmake/face_swap-config-version.cmake.in
   "${PROJECT_BINARY_DIR}/face_swap-config-version.cmake" @ONLY)
-
+
 # Install config files
 install(FILES
   "${PROJECT_BINARY_DIR}/face_swap-config.cmake"
   "${PROJECT_BINARY_DIR}/face_swap-config-version.cmake"
   DESTINATION "cmake" COMPONENT dev)
-
+
 # Install the export set for use with the install-tree
 install(EXPORT face_swap-targets DESTINATION cmake COMPONENT dev)

diff --git a/face_swap/CMakeLists.txt b/face_swap/CMakeLists.txt
index 1fcb187..ca56ecd 100644
--- a/face_swap/CMakeLists.txt
+++ b/face_swap/CMakeLists.txt
@@ -44,6 +44,7 @@ target_include_directories(face_swap PUBLIC
        ${HDF5_INCLUDE_DIRS}
 )
 target_link_libraries(face_swap PUBLIC
+       uuid
        iris_sfs
        ${OpenCV_LIBS}
        dlib::dlib
$ ~/cmake-3.13.3-Linux-x86_64/bin/cmake -DWITH_BOOST_STATIC=OFF -DBUILD_INTERFACE_PYTHON=ON -DBUILD_SHARED_LIBS=OFF -DBUILD_APPS=ON -DBUILD_TESTS=OFF cmake -DCM
AKE_INSTALL_PREFIX=/usr/local/face_swap -DCMAKE_BUILD_TYPE=Release ..
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   filesystem
--   program_options
--   regex
--   timer
--   thread
--   chrono
--   system
--   date_time
--   atomic
-- Found OpenCV: /usr (found version "3.2.0") found components:  highgui imgproc imgcodecs calib3d photo
-- HDF5: Using hdf5 compiler wrapper to determine C configuration
-- HDF5: Using hdf5 compiler wrapper to determine CXX configuration
-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5_cpp.so;/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.10.0.1") found components:  C CXX HL
-- Found Eigen3: /usr/include/eigen3 (Required is at least version "2.91.0")
-- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread (found version "3.0.0")
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.13") found components:  doxygen missing components:  dot
-- Found SWIG: /usr/bin/swig3.0 (found version "3.0.12")
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.6m.so (found version "3.6.7")
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   python
--   numpy
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dli/Projects/face_swap/build
$ make -j16
$ sudo make install
[ 20%] Built target iris_sfs
[ 66%] Built target face_swap
[ 73%] Built target face_swap_image
[ 80%] Built target face_swap_batch
[ 86%] Built target face_swap_single2many
[ 93%] Built target face_swap_image2video
[100%] Built target face_swap_py
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/face_swap/cmake/face_swap-config.cmake
-- Installing: /usr/local/face_swap/cmake/face_swap-config-version.cmake
-- Installing: /usr/local/face_swap/cmake/face_swap-targets.cmake
-- Installing: /usr/local/face_swap/cmake/face_swap-targets-release.cmake
-- Installing: /usr/local/face_swap/data/images/brad_pitt_01.jpg
-- Installing: /usr/local/face_swap/data/images/bruce_willis_01.jpg
-- Installing: /usr/local/face_swap/lib/libiris_sfs.a
-- Installing: /usr/local/face_swap/include/face_swap/face_swap_export.h
-- Installing: /usr/local/face_swap/lib/libface_swap.a
-- Installing: /usr/local/face_swap/include/face_swap/basel_3dmm.h
-- Installing: /usr/local/face_swap/include/face_swap/cnn_3dmm.h
-- Installing: /usr/local/face_swap/include/face_swap/cnn_3dmm_expr.h
-- Installing: /usr/local/face_swap/include/face_swap/face_seg.h
-- Installing: /usr/local/face_swap/include/face_swap/face_swap_engine.h
-- Installing: /usr/local/face_swap/include/face_swap/face_swap_engine_impl.h
-- Installing: /usr/local/face_swap/include/face_swap/face_swap_c_interface.h
-- Installing: /usr/local/face_swap/include/face_swap/render_utilities.h
-- Installing: /usr/local/face_swap/include/face_swap/utilities.h
-- Installing: /usr/local/face_swap/include/face_swap/face_detection_landmarks.h
-- Installing: /usr/local/face_swap/include/face_swap/landmarks_utilities.h
-- Installing: /usr/local/face_swap/include/face_swap/segmentation_utilities.h
-- Installing: /usr/local/face_swap/bin/face_swap_image
-- Set runtime path of "/usr/local/face_swap/bin/face_swap_image" to ""
-- Installing: /usr/local/face_swap/bin/face_swap_image.cfg
-- Installing: /usr/local/face_swap/bin/face_swap_batch
-- Set runtime path of "/usr/local/face_swap/bin/face_swap_batch" to ""
-- Installing: /usr/local/face_swap/bin/face_swap_batch.cfg
-- Installing: /usr/local/face_swap/bin/face_swap_single2many
-- Set runtime path of "/usr/local/face_swap/bin/face_swap_single2many" to ""
-- Installing: /usr/local/face_swap/bin/face_swap_single2many.cfg
-- Installing: /usr/local/face_swap/bin/face_swap_image2video
-- Set runtime path of "/usr/local/face_swap/bin/face_swap_image2video" to ""
-- Installing: /usr/local/face_swap/bin/face_swap_image2video.cfg
-- Installing: /usr/local/face_swap/interfaces/python/face_swap_py.so
-- Set runtime path of "/usr/local/face_swap/interfaces/python/face_swap_py.so" to ""
$ ls /face_swap/data/
3dmm_cnn_resnet_101.caffemodel          BaselFace.dat                           face_seg_fcn8s_300_no_aug.zip           images/
3dmm_cnn_resnet_101_deploy.prototxt     BaselFaceModel_mod_wForehead_noEars.h5  face_seg_fcn8s.caffemodel               shape_predictor_68_face_landmarks.dat
3dmm_cnn_resnet_101_mean.binaryproto    face_seg_fcn8s_300.caffemodel           face_seg_fcn8s_deploy.prototxt
3dmm_cnn_resnet_101.zip                 face_seg_fcn8s_300_deploy.prototxt      face_seg_fcn8s.zip
$ sudo cp /face_swap/data/* /usr/local/face_swap/data/ -r
face_swap_image --cfg /usr/local/face_swap/bin/face_swap_image.cfg --input dao.jpg --input min1.jpg --output o2.jpg
Leave a Comment

Be careful with market orders

I was testing my algorithmic trading program just now, and experienced an very important issue with market orders.

TL;DR DO NOT USE MARKET ORDERS UNLESS ABSOLUTELY CONFIDENT!!!

Market orders ensure immediate execution, without guarantee of the price of the order. As a result, your order may be executed with a much higher price than you’ve expected, especially when the trading volume is low and the spread is large. In my case, I ended up paying +5% more than the price I’m willing to pay…

Lessons should be learned.

Leave a Comment

Exporting and Importing Elasticsearch Indicies

In my project I need to run some local tests with data from a production elasticsearch cluster, so I exported data from the production server and imported to my local cluster. This can also be used when backing up and restoring data. Here’re the instructions.

Before you start, check out the official documentation: Snapshot and Restore.

Backing up/exporting data:

  1. Modify your eleasticsearch configuration file (normally elasticsearch.yml) and add a path.repo line, for example:
    path.repo: /usr/local/var/backups/
  2. Make sure this path has the correct permissions so that elasticsearch can read and write.
  3. Create snapshot:
    curl -XPUT http://localhost:9200/_snapshot/my_backup -d '{"type": "fs", "settings": {"compress": "true", "location": "/usr/local/var/backups/"}}}'
    curl -XPUT http://localhost:9200/_snapshot/my_backup/snapshot_1?wait_forcompletion=true
  4. Copy the files in the configured location to your local machine.

Restoring/importing data:

  1. Modify your local elasticsearch configuration similarly like step 1 when backing up.
  2. Place the snapshot files to the repo path.
  3. Close your indices:
    curl -XPOST http://localhost:9200/knx-bus/_close
  4. Import data:
    curl -XPOST http://localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty
  5. Reopen your indices:
    curl -XPOST http://localhost:9200/knx-bus/_open

It is important that your the elasticsearch version on your importing party is compatible with the one exporting data, i.e., in this case your local machine has to be the same version or newer. If not, you need to upgrade elasticsearch first. The official documentation says:

The information stored in a snapshot is not tied to a particular cluster or a cluster name. Therefore it’s possible to restore a snapshot made from one cluster into another cluster. All that is required is registering the repository containing the snapshot in the new cluster and starting the restore process. The new cluster doesn’t have to have the same size or topology. However, the version of the new cluster should be the same or newer than the cluster that was used to create the snapshot.

2 Comments

Installing Theano and CUDA on Mac OS X

I started trying Theano today and wanted to use the GPU (NVIDIA GeForce GT 750M 2048 MB) on my Mac. Here’s a brief instruction on how to use the GPU on Mac, largely following the instructions from http://deeplearning.net/software/theano/install.html#mac-os.

Install Theano:

$ pip install Theano

Download and install CUDA: https://developer.nvidia.com/cuda-downloads

Put the following lines into your ~/.bash_profile:

# Theano and CUDA
PATH="/Developer/NVIDIA/CUDA-7.5/bin/:$PATH"
export LD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-7.5/lib/
export CUDA_ROOT=/Developer/NVIDIA/CUDA-7.5/
export THEANO_FLAGS='mode=FAST_RUN,device=gpu,floatX=float32'

Note that the PATH line is necessary. Otherwise you may see the following message:

ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.

Configure Theano:

$ cat .theanorc 
[gcc]
cxxflags = -L/usr/local/lib -L/Developer/NVIDIA/CUDA-7.5/lib/

Test if GPU is used:

$ cat check.py 
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 time python check.py 
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 1.743682 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu
        2.47 real         2.19 user         0.27 sys
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 time python check.py 
Using gpu device 0: GeForce GT 750M
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 1.186971 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
  1.62323296]
Used the gpu
        2.09 real         1.59 user         0.41 sys

A more realistic example:

$ cat lr.py 
import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probability of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])

# Compile expressions to functions
train = theano.function(
            inputs=[x,y],
            outputs=[prediction, xent],
            updates=[(w, w-0.01*gw), (b, b-0.01*gb)],
            name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
            name = "predict")

if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
        train.maker.fgraph.toposort()]):
    print('Used the cpu')
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
          train.maker.fgraph.toposort()]):
    print('Used the gpu')
else:
    print('ERROR, not able to tell if theano used the cpu or the gpu')
    print(train.maker.fgraph.toposort())

for i in range(training_steps):
    pred, err = train(D[0], D[1])

print("target values for D")
print(D[1])

print("prediction on D")
print(predict(D[0]))
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 time python lr.py 
Used the cpu
target values for D
[ 1.  1.  0.  1.  0.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.
  1.  0.  0.  1.  0.  0.  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  0.  1.
  0.  0.  0.  0.  0.  1.  0.  0.  0.  1.  1.  0.  1.  1.  1.  0.  1.  0.
  0.  0.  0.  0.  0.  1.  0.  1.  0.  0.  0.  1.  1.  1.  0.  0.  1.  1.
  1.  1.  0.  0.  0.  1.  0.  0.  1.  1.  0.  0.  1.  1.  1.  1.  0.  1.
  0.  0.  0.  0.  1.  0.  0.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  0.  1.  1.  0.  0.  1.  0.  0.  0.  1.  0.  1.  1.  1.
  1.  0.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.
  1.  0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.  0.
  1.  0.  1.  0.  0.  1.  0.  0.  1.  1.  1.  1.  0.  1.  0.  0.  1.  0.
  0.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  0.  1.  0.
  0.  1.  1.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  1.  0.  1.  0.  1.
  1.  0.  1.  1.  1.  0.  0.  1.  1.  1.  1.  0.  0.  0.  1.  1.  0.  0.
  1.  0.  0.  0.  0.  1.  1.  1.  0.  1.  1.  1.  0.  1.  0.  0.  0.  0.
  0.  1.  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  0.  1.  0.  1.  0.  1.
  1.  0.  0.  0.  1.  1.  0.  0.  1.  0.  0.  0.  0.  1.  0.  0.  0.  1.
  0.  1.  0.  1.  1.  0.  1.  1.  0.  0.  0.  0.  1.  0.  0.  0.  0.  1.
  0.  1.  0.  0.  1.  1.  0.  0.  1.  1.  0.  1.  0.  1.  0.  0.  1.  1.
  0.  1.  1.  0.  0.  1.  1.  0.  0.  1.  0.  1.  1.  0.  0.  0.  1.  0.
  0.  0.  1.  0.  0.  0.  0.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.
  1.  0.  0.  1.  0.  0.  0.  0.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.
  0.  1.  1.  1.  0.  0.  0.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  0.
  1.  1.  0.  1.]
prediction on D
[1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 1 1 0 1 1 0 1 0
 0 0 0 0 1 0 0 0 1 1 0 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 1 1
 0 0 0 1 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1
 1 0 1 1 0 0 1 0 0 0 1 0 1 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1
 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 0 0 0 1 1 1
 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 1 0
 0 1 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1
 0 0 1 1 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1
 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0
 1 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1 0 0 0 0 1 1
 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 1 0 1]
        8.92 real         8.24 user         1.14 sys
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 time python lr.py 
Using gpu device 0: GeForce GT 750M
Used the gpu
target values for D
[ 1.  0.  0.  0.  0.  1.  0.  0.  1.  1.  0.  0.  1.  1.  0.  0.  1.  1.
  0.  0.  0.  1.  1.  0.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  0.
  0.  1.  0.  0.  1.  1.  0.  0.  1.  1.  0.  1.  0.  1.  1.  0.  1.  1.
  1.  0.  1.  1.  0.  0.  0.  1.  1.  1.  1.  1.  0.  0.  1.  1.  0.  1.
  1.  1.  1.  0.  1.  1.  0.  1.  1.  1.  0.  0.  0.  1.  1.  0.  0.  0.
  1.  0.  1.  0.  0.  0.  0.  1.  1.  1.  1.  0.  0.  1.  0.  1.  0.  1.
  1.  0.  1.  1.  0.  0.  0.  0.  1.  0.  0.  1.  0.  0.  0.  1.  0.  1.
  1.  1.  0.  0.  0.  1.  0.  1.  0.  1.  0.  1.  1.  1.  1.  1.  0.  1.
  1.  0.  1.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  0.  1.  0.  0.  0.
  1.  0.  0.  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  0.  1.  0.  0.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  0.  0.  0.  0.  1.  1.  1.  0.  1.
  0.  1.  0.  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  1.  0.  0.  0.  1.
  0.  1.  1.  1.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  0.  0.  1.
  0.  0.  1.  1.  0.  1.  0.  1.  1.  1.  0.  0.  1.  1.  0.  0.  0.  0.
  1.  0.  0.  1.  0.  0.  0.  0.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  0.  0.  0.  1.  0.  1.  1.  0.  0.  0.  0.  0.  0.  1.  0.
  1.  1.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  1.  0.  0.  1.  0.  0.
  1.  1.  0.  1.  1.  1.  0.  0.  0.  0.  0.  1.  0.  1.  0.  0.  0.  1.
  0.  0.  1.  1.  0.  1.  1.  0.  1.  1.  1.  0.  1.  1.  0.  0.  0.  0.
  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  0.  1.
  1.  1.  0.  1.  1.  0.  1.  1.  1.  0.  0.  1.  1.  0.  0.  0.  0.  0.
  1.  0.  0.  1.  1.  1.  0.  1.  0.  0.  1.  1.  0.  1.  1.  0.  1.  1.
  0.  0.  1.  0.]
prediction on D
[1 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0
 1 0 0 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0 0 1 1 1 1 1 0 0 1 1 0 1 1 1
 1 0 1 1 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 0 1
 1 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1
 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1
 1 1 1 0 0 0 0 0 1 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1
 1 1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0
 0 1 0 0 1 1 1 1 1 1 1 0 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 1
 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 1
 1 1 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 0 1 1 0 1 1 1 0
 0 1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 0]
       19.78 real        17.61 user         1.24 sys

So it seems this GPU does not outperform the CPU. Well,GT 750M may not be the best GPU you can get… Someone else here has a similar experience.

 

5 Comments

Statistics of insurance sold on Taobao.com on Valentine’s Day

On Feb. 14th Taobao launched a campaign to sell insurance products, which promises 7% yearly interest rate. The sales data is public, so I wrote a script to crawl them down and did a brief study on this data. Here’re the results.

On that day (actually sold out in less than two hours in total) more than 40,000 people participated, resulting a total sales of almost one billion CNY (the exact number: 980,270,000 CNY). Two companies participated in this sales campaign: Zhujiang and Tian’an. The sales statistics are:

Zhujiang Tian’an Total
# of Customers 13831 29092 42923
Sales mean (k CNY) 24.922059 21.847003 22.837872
Sales min (k CNY) 1 1 1
Sales 25% (k CNY) 1 2 2
Sales 50% (k CNY) 10 10 10
Sales 75% (k CNY) 20 25 22
Sales max (k CNY) 1000 900 1000
Sales total (k CNY) 344697 635573 980270

The histograms of how many people pay for each amount.

le100kgt100kle100k

Zhujiang was extremely popular: in 2 minutes and 56 seconds it reached a sales of 200,212,000 CNY, that’s more than 1 million CNY sales PER SECOND! Indeed Chinese are crazy about online shopping. 😀

Leave a Comment

MapReduce in MongoDB

http://docs.mongodb.org/manual/core/map-reduce/

http://docs.mongodb.org/manual/reference/command/mapReduce/

> db.lattern_money_record.mapReduce( function() { emit(this.quantity, 1) }, function(key, values) { return Array.sum(values) }, {   query: {'quantity': {$gt: 500}}, out: {inline: 1} } )
{
	"results" : [
		{
			"_id" : 550,
			"value" : 3
		},
		{
			"_id" : 570,
			"value" : 1
		},
		{
			"_id" : 580,
			"value" : 1
		},
		{
			"_id" : 583,
			"value" : 1
		},
		{
			"_id" : 587,
			"value" : 1
		},
		{
			"_id" : 600,
			"value" : 2
		},
		{
			"_id" : 660,
			"value" : 1
		},
		{
			"_id" : 700,
			"value" : 2
		},
		{
			"_id" : 800,
			"value" : 5
		},
		{
			"_id" : 900,
			"value" : 2
		},
		{
			"_id" : 924,
			"value" : 1
		},
		{
			"_id" : 949,
			"value" : 1
		},
		{
			"_id" : 980,
			"value" : 1
		},
		{
			"_id" : 990,
			"value" : 1
		},
		{
			"_id" : 1000,
			"value" : 12
		}
	],
	"timeMillis" : 36,
	"counts" : {
		"input" : 35,
		"emit" : 35,
		"reduce" : 6,
		"output" : 15
	},
	"ok" : 1,
}

The MapReduce code I used to analyze the 20 million hotel reservation records:

def get_aggregation(collection):
    '''
    1. Get unique set of people
    2. Get most frequent users
    3. Get aggregation by location of birth, age, month and day of birth
    '''
    # Emit multiple times in mapper function:
    # http://docs.mongodb.org/manual/reference/command/mapReduce/
    mapper = Code('''
                  function() {
                    function validate_rid(id) {
                        // From: https://gist.github.com/foxwoods/1817822
                        // 18位身份证号
                        // 国家标准《GB 11643-1999》
                        function rid18(id) {
                            if(! /\d{17}[\dxX]/.test(id)) {
                                return false;
                            }
                            var modcmpl = function(m, i, n) { return (i + n - m % i) % i; },
                                f = function(v, i) { return v * (Math.pow(2, i-1) % 11); },
                                s = 0;
                            for(var i=0; i<17; i++) {
                                s += f(+id.charAt(i), 18-i);
                            }
                            var c0 = id.charAt(17),
                                c1 = modcmpl(s, 11, 1);
                            return c0-c1===0 || (c0.toLowerCase()==='x' && c1===10);
                        }

                        // 15位身份证号
                        // 2013年1月1日起将停止使用
                        // http://www.gov.cn/flfg/2011-10/29/content_1981408.htm
                        function rid15(id) {
                            var pattern = /[1-9]\d{5}(\d{2})(\d{2})(\d{2})\d{3}/,
                                matches, y, m, d, date;
                            matches = id.match(pattern);
                            y = +('19' + matches[1]);
                            m = +matches[2];
                            d = +matches[3];
                            date = new Date(y, m-1, d);
                            return (date.getFullYear()===y && date.getMonth()===m-1 && date.getDate()===d);
                        }

                        // return rid18(id) || rid15(id);
                        try {
                            ret = rid18(id) || rid15(id);
                            return ret;
                        } catch (err) {
                            return false;
                        }
                    }

                    function validateEmail(email) {
                        // http://stackoverflow.com/questions/46155/validate-email-address-in-javascript
                        var re = /^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
                        return re.test(email);
                    }

                    var str = this.CtfId;
                    if (str && validate_rid(str)) {
                        var prov = parseInt(str.slice(0, 2));
                        var year, month, day, sex;
                        if (str.length == 15) {
                            year = parseInt('19' + str.slice(6, 8));
                            month = parseInt(str.slice(8, 10));
                            day = parseInt(str.slice(10, 12));
                            sex = parseInt(str.slice(14, 15)) % 2 ? 'M' : 'F';
                        } else {
                            year = parseInt(str.slice(6, 10));
                            month = parseInt(str.slice(10, 12));
                            day = parseInt(str.slice(12, 14));
                            sex = parseInt(str.slice(16, 17)) % 2 ? 'M' : 'F';
                        }
                        var age = 2013 - year;
                        var valid_provs = [11, 12, 13, 14, 15,
                            21, 22, 23, 31, 32, 33, 34, 35, 36, 37,
                            41, 42, 43, 44, 45, 46,
                            50, 51, 52, 53, 54,
                            61, 62, 63, 64, 65,
                            71, 81, 82, 91];
                        if (age <= 0 || age > 100 ||
                            month <=0 || month > 12 ||
                            day <= 0 || day > 31 ||
                            valid_provs.indexOf(prov) == -1) {
                            emit('Corrupted', 1);
                        } else {
                            // emit('Province ' + prov, 1);
                            // emit('Age ' + age, 1);
                            // emit('Month ' + month, 1);
                            // emit('Day ' + day, 1);
                            // emit('Sex ' + sex, 1);
                            // emit('Prov ' + prov + ' Sex ' + sex, 1);
                            // if (this.Address && this.Address.length > 3) {
                            //     var cur_prov = this.Address.slice(0, 3);
                            //     emit('From ' + prov + ' to ' + cur_prov, 1);
                            // }

                            // var email = this.EMail;
                            // if (email && validateEmail(email)) {
                            //     var idx = email.lastIndexOf('@');
                            //     var domain = email.slice(idx + 1);
                            //     emit(domain.toLowerCase(), 1);
                            // }

                            if (prov == 32 && sex == 'M') {
                                emit(str, 1);
                            }
                            // if (prov == 32 && sex == 'F') {
                            //     emit(str, 1);
                            // }
                        }
                    } else {
                        emit('Corrupted', 1);
                    }
                  }''')
    reducer = Code('''
                   function(key, values) {
                    return Array.sum(values);
                   }''')
    result = collection.map_reduce(
        mapper, reducer, 'aggregation', query={'CtfTp': 'ID'}
    )
    return result

 

Leave a Comment