Skip to content →

Category: Data Analysis

Notes to myself: RTX 2070, Cuda, cudnn, caffe, and faceswap

Install NVIDIA driver for RTX 2070: https://www.geforce.com/drivers

Install CUDA 10.0: https://developer.nvidia.com/cuda-downloads

DO NOT re-install the drivers suggested by the CUDA installer:
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: n

CuDNN:

cuDNN Runtime Library for Ubuntu18.04 (Deb)

cuDNN Developer Library for Ubuntu18.04 (Deb)

Caffe:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)
linked by target “caffe” in directory /home/dli/Projects/caffe/src/caffe

Upgrade cmake:

face_swap

Leave a Comment

Be careful with market orders

I was testing my algorithmic trading program just now, and experienced an very important issue with market orders.

TL;DR DO NOT USE MARKET ORDERS UNLESS ABSOLUTELY CONFIDENT!!!

Market orders ensure immediate execution, without guarantee of the price of the order. As a result, your order may be executed with a much higher price than you’ve expected, especially when the trading volume is low and the spread is large. In my case, I ended up paying +5% more than the price I’m willing to pay…

Lessons should be learned.

Leave a Comment

Exporting and Importing Elasticsearch Indicies

In my project I need to run some local tests with data from a production elasticsearch cluster, so I exported data from the production server and imported to my local cluster. This can also be used when backing up and restoring data. Here’re the instructions.

Before you start, check out the official documentation: Snapshot and Restore.

Backing up/exporting data:

  1. Modify your eleasticsearch configuration file (normally elasticsearch.yml) and add a path.repo line, for example:
  2. Make sure this path has the correct permissions so that elasticsearch can read and write.
  3. Create snapshot:
  4. Copy the files in the configured location to your local machine.

Restoring/importing data:

  1. Modify your local elasticsearch configuration similarly like step 1 when backing up.
  2. Place the snapshot files to the repo path.
  3. Close your indices:
  4. Import data:
  5. Reopen your indices:

It is important that your the elasticsearch version on your importing party is compatible with the one exporting data, i.e., in this case your local machine has to be the same version or newer. If not, you need to upgrade elasticsearch first. The official documentation says:

The information stored in a snapshot is not tied to a particular cluster or a cluster name. Therefore it’s possible to restore a snapshot made from one cluster into another cluster. All that is required is registering the repository containing the snapshot in the new cluster and starting the restore process. The new cluster doesn’t have to have the same size or topology. However, the version of the new cluster should be the same or newer than the cluster that was used to create the snapshot.

2 Comments

Installing Theano and CUDA on Mac OS X

I started trying Theano today and wanted to use the GPU (NVIDIA GeForce GT 750M 2048 MB) on my Mac. Here’s a brief instruction on how to use the GPU on Mac, largely following the instructions from http://deeplearning.net/software/theano/install.html#mac-os.

Install Theano:

Download and install CUDA: https://developer.nvidia.com/cuda-downloads

Put the following lines into your ~/.bash_profile:

Note that the PATH line is necessary. Otherwise you may see the following message:

ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.

Configure Theano:

Test if GPU is used:

A more realistic example:

So it seems this GPU does not outperform the CPU. Well,GT 750M may not be the best GPU you can get… Someone else here has a similar experience.

 

5 Comments

Statistics of insurance sold on Taobao.com on Valentine’s Day

On Feb. 14th Taobao launched a campaign to sell insurance products, which promises 7% yearly interest rate. The sales data is public, so I wrote a script to crawl them down and did a brief study on this data. Here’re the results.

On that day (actually sold out in less than two hours in total) more than 40,000 people participated, resulting a total sales of almost one billion CNY (the exact number: 980,270,000 CNY). Two companies participated in this sales campaign: Zhujiang and Tian’an. The sales statistics are:

Zhujiang Tian’an Total
# of Customers 13831 29092 42923
Sales mean (k CNY) 24.922059 21.847003 22.837872
Sales min (k CNY) 1 1 1
Sales 25% (k CNY) 1 2 2
Sales 50% (k CNY) 10 10 10
Sales 75% (k CNY) 20 25 22
Sales max (k CNY) 1000 900 1000
Sales total (k CNY) 344697 635573 980270

The histograms of how many people pay for each amount.

le100kgt100kle100k

Zhujiang was extremely popular: in 2 minutes and 56 seconds it reached a sales of 200,212,000 CNY, that’s more than 1 million CNY sales PER SECOND! Indeed Chinese are crazy about online shopping. 😀

Leave a Comment

MapReduce in MongoDB

http://docs.mongodb.org/manual/core/map-reduce/

http://docs.mongodb.org/manual/reference/command/mapReduce/

The MapReduce code I used to analyze the 20 million hotel reservation records:

 

Leave a Comment