Before we start, I'd like to tell you about why I use Python for financial computing. It took me several years to get a grasp of all the options out there and I will try to convince you that Python is really the best tool for most of the tasks involved in trading.
When I started programming as a kid somewhere in the early nineties, choosing a programming language was easy, as there were simply not many to choose from. I first started in Pascal and since then have programmed in Delphi, C, C++, C#,Java , VB, PHP, Matlab, Python, SPIN and even ASM. I did not learn all these languages for fun, as I have better things to do (like actual work), but I needed to as I had no "swiss army knife" language for all my needs. I needed C and Delphi for making stand-alone applications, PHP to build a website and Matlab for scientific calculations. As the saying goes, "jack of all trades is master of none" , so by switching from one language to another I never acquired expert knowledge in any of them.
Ideally, I would like learn only one language that is suited for all kinds of work: number crunching, application building, web development, interfacing with APIs etc. This language would be easy to learn, the code would be compact and clear, it would run on any platform. It would enable me to work interactively, enabling the code to evolve as I write it and be at least free as in speech. And most importantly, I care much more about my own time than the cpu time of my pc, so number crunching performance is less important for me than my own productivity.
Today two most popular languages for technical and scientific computing are Matlab and Python. Both of them satisfy many of the wishes described above, but they have some important differences. Matlab is most popular when it comes to technical computing. This is what I used to use day-to-day for solving engineering problems. For numeric simulations and working with "clean" data, it is probably the best tool there is. Good IDE, fantastic plotting functions, great documentation. It is less well suited for application development or as a general purpose language. Expect to pay \~2k\$ for a basic commercial license plus extra for specific toolboxes.
Doing financial research in Matlab has proven to be quite a challenge for me, mainly because there is no easy way of handling "dirty" data (data that is not nicely aligned in a table, but has multiple sources with different dates and missing entries). Another challenge that I faced was keeping my code from becoming a mess. It is possible to write neat libraries with Matlab, but is far from trivial and the language design actually encourages messy coding. While using Matlab for trading strategy development I was able to deal with the shortcomings of this platform. However, when I decided to build an automatic trading system, I had hit a dead end. While I was able to connect to Interacive Brokers API, it turned out that there was no way to create a reliable application. While good for research, Matlab sucks for deployment. This was when I decided to look at other options. Python is very similar to Matlab and solves most of its shortcomings. And is free! With Ipython notebook interactive work in Python is just easy as in Matlab, but what you get is a programming language that can complete almost any task, from data mining to web development and production quality applications with great GUIs. If I"d have to start all over again, I would choose Python as it would save me the trouble of learning another language for Gui and web development. After using Python for three years, I am still as enthusiastic as the moment I have fallen in love with it. I feel that many other traders can greatly benefit by learning Python from the start and for this reason I have set up a Trading With Python course.
Python, like most open source software has one specific characteristic: it can be challenging for a beginner to find his way around thousands of libraries and tools. This guide will help you get everything you need in your quant toolbox, hopefully without any problems.
Fortunately there are several distributions, containing most of the required packages, making installation a breeze.
The Anaconda distribution includes:
- Python 3 Python interpreter on top of which everything else runs
- Ipython : Interactive shell & notebook
- Spyder IDE
- numpy & scipy : scientific computing tools, similar to Matlab
- pandas2 : Data structures library
- ... many more scientific and utility packages, see package list
So, please, go ahead and install Anaconda
Extra tools and libraries
Next to the goodies that came included in with the Anaconda installer, you'll need at least a descent text editor and a browser
- notepad++ is versatile and lightweight text editor
- Google Chrome or Firefox browser is needed for Jupyter notebook (Internet Explorer won't work)
Other handy libraries include tools for xml reading, documentation, etc. , will be covered later on.
Most of the code of this course is run in an interactive document called a "notebook".
Note: The interactive programming environment that we use is called Jupyter notebook. Previously it was called "IPython notebook" , but has been renamed to "Jupyter". This was done to show that multiple languages are supported (JUlia, PYThon, R ...and more) This course was written before this naming transition, so occasionaly you'll encounter references to Ipython notebook, which is the same as Jypyter notebook
Launching Jupyter notebook
At this moment (May 2016) it is not possible to change the working directory after starting the notebook. You need to start it in the directory containing your notebooks to get access to your notebooks.
There are however several options to quickly open your notebooks:
Starting Jupyter notebook with a shortcut
If you are using a static directory to store the notebooks, the easiest way to open them would be in a custom directory is by using a modified shortcut: 1. Find the shortcut to notebook in Start menu by clicking 'Start' and typing 'Jupyter' in the search window.
Once the shortcut is found, copy it to clipboard by pressing the right mouse button and selecting 'copy'. Then paste it to your desktop. Now you can edit the working directory by clicking right button on the desktop shortcut and choosing 'properties'. Change 'Start in' field to the directory where your notebooks are located.
You can create multiple shortcuts one for each separate notebooks directory. <!-- A short screencast will show you how to do this
A more extensive tutorial on using the notebook can be found here.
NumPy is a fundamental package designed for scientific calculations. In its functionality it is very similar to Matlab, providing methods of working with multidimensional matrices and arrays. Numpy website provides all the documentation you need along with a tutorial, but reading Chapter 4 of the Python for Data Analysis book is even better to get an overview of what this tool can do. You shouldn't worry too much about understanding all the bells and whistles of NumPy, for now it is enough to understand the general concepts of working with ndarray and indexing.
To get an idea of the almost endless capabilities of this library, just take a look at the matplotlib gallery! We will normally only need a the plot() and hist() functions. Another great tutorial of plotting functions is given in this notebook.
Writing, running and debugging code
Until now we focused on writing code inside the IPython notebook. This is a good way for quick prototyping but when you need to reuse the same functionality in different notebooks, copy-pasting code is a very bad habit. A good habit would be using modules to reuse functionality. A module is essentially a .py file or a directory with .py files containing functions and classes. These functions/classes can be made accessible by the
import directive. A good explanation of the modules can be found in the python docs. We will be looking at writing our own modules in Part 2, for now it is enough to know how to reuse functionality from existing modules. A typical code development workflow consists of two stages:
Prototyping stage : This is where you take the quick-n-dirty approach. Develop interactively using IPython, IPython notebook or Spyder. Here you can reuse functions from existing libraries and create new functionality. The notebook is ideal for interactive work, but less suitable for advanced debugging, Spyder is excellent for debugging and Ipython is somewhere in between. My own experience is that an advanced debugger is seldom required, normally I can solve 70% of the errors just by looking at the error message, another 25% by adding a print statement. There is also a way to start a debugger from the notebook. Just type
%qtconsole in the notebook and a new console will open connected to the same ipython behind the scenes. The console has access to all the variables and can also run
%debug, which will start a debugging session.
Module stage: Once you are happy with the functionality developed in a prototyping stage, you can integrate it in a module. At this stage it is a good practice to add some documentation to the code you have written. Code documentation in Python is very easy with docstrings. Docstrings are text strings included in the code which are used for documenting functionality. For a couple of examples take a look here. For optimal productivity in the module stage you need a good source code editor. There are many choices out there. My favorite (free) ones are (in order of increasing complexity and features):
- Notepad++ notepad, but much better (syntax highlighting etc). Ideal for quick code changes, when you don"t want to fire up a more extensive editor.
- Spyder : * lightweight editor which closes the gap between IPython and a full-featured IDE (Integrated Development Environment). Specifically targeted at interactive scientific work.
- Pyscripter - Easy to use IDE with a nicely integrated debugger. Windows only.
- Pydev - proffesional quality IDE.
It may take some time to find a way of developing code that suits you best. For me the ideal workflow is: Prototype with notebook -> add to a module with PyDev or PyScripter -> use module in a new notebook.\ More reading material: chapter 3 of the PDA book. Ok, enough theory, let"s get to working with modules. If you haven't already downloaded the workbooks for this part, please get them from example notebooks section. And take a look at the twp_03_Working_with_modules notebook.
Now is the time to make use of the concepts you have learned in this part.We will jump right into working with numpy matrices and plotting functions. Regarding plotting you now only need two functions plot() and* hist()* together with a couple of commands to set the titles and axes labels. There are three example notebooks in for this part of the course:
- twp_01_IPython_Notebook - shows you the way around IPython notebook (view online)
- twp_02_Leveraged_etfs - simulate leveraged etfs to prove that there is no such thing as leveraged etf decay (view online)
- twp_03_Working_with_modules.ipynb learn to work with modules (view online)
Get the notebooks Just get the zip file and extract it to your notebooks folder, then start Jupyter notebook to see them appear in the dashboard.