Feature engineering is a fundamental but poorly documented component in LTR search applications.
As a result, there are still few open access software packages that allow researchers and practitioners to easily simulate a feature extraction pipeline and conduct experiments in a lab setting.
This talk introduces Fxt, an open-source framework to perform efficient and scalable feature extraction. Fxt may be integrated into complex, high-performance software applications to help solve a wide variety of text-based machine learning problems.
The talk details how we built and documented a reproducible feature extraction pipeline with LTR experiments using the ClueWeb09B collection.
This LTR dataset is publicly available.
We’ll also discuss some of the benefits (feature extraction efficiency, model interpretation) of having open access tooling in this area for researchers and practitioners alike.