Tasks¶
The following tasks allow for a complete workflow to be implemented from fetching the
tournament data automatically, to training a model and then submitting it.
A Luigi task is made to be used in conjunction with other tasks. Hence, dependencies
between the tasks are explicitly modeled. The functions requires
, which expresses
any incoming dependencies, as well as outputs
, which models products of a tasks, are
used for this. The run
function is called, when the outputs
are not yet satisfied
in order to produce the outputs. If the outputs on the other hand are already produced,
the tasks won’t be run as the targets are already there.
-
class
tasks.numerai_fetch_training_data.
FetchAndExtractData
(*args, **kwargs)[source]¶ Fetches the most recent dataset and extracts the contents to the given path if not yet done (default path is
./data
).Param: output_path: (relative) path where the data should be written to. Defaults to ./data
. Default signature isFetchAndExtractData(output_path='./data')
.data ├── numerai_dataset_95 │ ├── example_model.py │ ├── example_model.r │ ├── example_predictions.csv │ ├── numerai_tournament_data.csv │ └── numerai_training_data.csv └── numerai_dataset_95.zip
-
output
()[source]¶ Manages the files to be written and determines their existence. This is determined by checking all the listed files below. If any of them does not exist,
run()
is evoked.Returns: A dict
with the following keys:zipfile
: original file as downloaded
(
numerai_dataset_xxx.zip
) *training_data.csv
: the training data (numerai_training_data.csv
) *tournament_data.csv
: the tournament data (numerai_tournament_data.csv
) *example_predictions.csv
: example predictions (example_predictions.csv
)Note that
example_model.py
andexample_model.r
are not referenced, as these are to no use for us.
-
-
class
tasks.numerai_train_and_predict.
TrainAndPredict
(*args, **kwargs)[source]¶ Trains a naïve bayes classifier with an assumed bernoulli distribution of the features, then predicts the targets on the tournament data. The default signature of this task is
TrainAndPredict(output_path='./data')
.Param: output_path (str): path to the directory where the predictions shall be saved to, defaults to ./data
.-
output
()[source]¶ Saves outputs of this task–which is a csv file of the predictions made for the given data.
-
requires
()[source]¶ Dependencies to be fullfiled prior to execution. This task needs the
tasks.numerai_fetch_training_data.FetchAndExtractData
task that provides the training/tournament data.
-
-
class
tasks.numerai_upload_predictions.
UploadPredictions
(*args, **kwargs)[source]¶ This task uploads a prediction file if it wasn’t uploaded before. The file name is configured via the filepath parameter.
Param: secret (str): API secret as generated for the given public_id
by the numer.ai websiteParam: public_id (str): chosen API identifier as given by the numer.ai website Param: filepath (str): path to the file which is to be uploaded -
output
()[source]¶ Produces a
targets.numerai_submission.SubmissionTarget
for the current round.
-