commit
This commit is contained in:
18
solr/example/films/README.md
Normal file
18
solr/example/films/README.md
Normal file
@@ -0,0 +1,18 @@
|
||||
We have a movie data set in JSON, Solr XML, and CSV formats. All 3 formats contain the same data. You can use any one format to index documents to Solr.
|
||||
|
||||
This example uses the `_default` configset that ships with Solr plus some custom fields added via Schema API. It demonstrates the use of ParamSets in conjunction with the [Request Parameters API](https://solr.apache.org/guide/solr/latest/configuration-guide/request-parameters-api.html).
|
||||
|
||||
The original data was fetched from Freebase and the data license is present in the films-LICENSE.txt file. Freebase was shutdown in 2016 by Google.
|
||||
|
||||
This data consists of the following fields:
|
||||
* `id` - unique identifier for the movie
|
||||
* `name` - Name of the movie
|
||||
* `directed_by` - The person(s) who directed the making of the film
|
||||
* `initial_release_date` - The earliest official initial film screening date in any country
|
||||
* `genre` - The genre(s) that the movie belongs to
|
||||
* `film_vector` - The 10 dimensional vector representing the film, according to a toy example embedding model
|
||||
|
||||
The `name` and `initial_release_date` are created via the Schema API, and the `genre` and `direct_by` fields
|
||||
are created by the use of an Update Request Processor Chain called `add-unknown-fields-to-the-schema`.
|
||||
|
||||
The `film_vector` is an embedding vector created to represent the movie with 10 dimensions. The vector is created from a BERT pre-trained model, followed by a dimension reduction technique to reduce the embeddings from 768 to 10 dimensions. Even though it is expected that similar movies will be close to each other, this model is just a "toy example", so it's not guaranteed to be a good representation for the movies. The Python scripts utilized to create the model and calculate the films vectors are in the [vectors directory](./vectors).
|
||||
Reference in New Issue
Block a user