Tasks

The following tasks allow for a complete workflow to be implemented from fetching the tournament data automatically, to training a model and then submitting it. A Luigi task is made to be used in conjunction with other tasks. Hence, dependencies between the tasks are explicitly modeled. The functions requires, which expresses any incoming dependencies, as well as outputs, which models products of a tasks, are used for this. The run function is called, when the outputs are not yet satisfied in order to produce the outputs. If the outputs on the other hand are already produced, the tasks won’t be run as the targets are already there.

class tasks.numerai_fetch_training_data.FetchAndExtractData(*args, **kwargs)[source]

Fetches the most recent dataset and extracts the contents to the given path if not yet done (default path is ./data).

Param:output_path: (relative) path where the data should be written to. Defaults to ./data. Default signature is FetchAndExtractData(output_path='./data').
data
├── numerai_dataset_95
│   ├── example_model.py
│   ├── example_model.r
│   ├── example_predictions.csv
│   ├── numerai_tournament_data.csv
│   └── numerai_training_data.csv
└── numerai_dataset_95.zip
output()[source]

Manages the files to be written and determines their existence. This is determined by checking all the listed files below. If any of them does not exist, run() is evoked.

Returns:A dict with the following keys:
  • zipfile: original file as downloaded

(numerai_dataset_xxx.zip) * training_data.csv: the training data (numerai_training_data.csv) * tournament_data.csv: the tournament data (numerai_tournament_data.csv) * example_predictions.csv: example predictions (example_predictions.csv)

Note that example_model.py and example_model.r are not referenced, as these are to no use for us.

run()[source]

The task run method, to be overridden in a subclass.

See Task.run

class tasks.numerai_train_and_predict.TrainAndPredict(*args, **kwargs)[source]

Trains a naïve bayes classifier with an assumed bernoulli distribution of the features, then predicts the targets on the tournament data. The default signature of this task is TrainAndPredict(output_path='./data').

Param:output_path (str): path to the directory where the predictions shall be saved to, defaults to ./data.
output()[source]

Saves outputs of this task–which is a csv file of the predictions made for the given data.

requires()[source]

Dependencies to be fullfiled prior to execution. This task needs the tasks.numerai_fetch_training_data.FetchAndExtractData task that provides the training/tournament data.

run()[source]

Trains a model and makes predictions given the data. These are then saved to a csv file.

class tasks.numerai_upload_predictions.UploadPredictions(*args, **kwargs)[source]

This task uploads a prediction file if it wasn’t uploaded before. The file name is configured via the filepath parameter.

Param:secret (str): API secret as generated for the given public_id by the numer.ai website
Param:public_id (str): chosen API identifier as given by the numer.ai website
Param:filepath (str): path to the file which is to be uploaded
output()[source]

Produces a targets.numerai_submission.SubmissionTarget for the current round.

run()[source]

Submits the predictions.