|
3 months ago | |
---|---|---|
.. | ||
code | 3 months ago | |
extract | 3 months ago | |
format | 3 months ago | |
prompt | 3 months ago | |
README.md | 3 months ago | |
edoh-dataset-format-pep8.jsonl | 3 months ago | |
edoh-dataset-format.jsonl | 3 months ago | |
edoh-dataset.jsonl | 3 months ago | |
physics-01.jsonl | 3 months ago |
README.md
Generative Manim Datasets & Data Collection Pipeline
Some of the techniques to create better prompt-to-code Manim models will need a guide for training. In order to achieve that we need to compile a dataset of prompts and the corresponding code.
Sources
Manim (Community)
Manim
Datasets
Custom Dataset
Now, the structure we need to follow is to create a dataset with the following columns:
prompt
: Prompt to generate the code.code
: Corresponding code.type
: Type of media (video
,image
).
Altough we are focused on video generation, we should also consider images as a type of media, in order to train the model with vast examples that can be used in different scenarios.
- Extract code examples from the Manim community.
- Tag each code example with the corresponding type of media (if it uses
self.add
, it is an image, if it usesself.play
, it is a video). - Write a prompt for each code example.
Dataset Generation Pipeline
💡 Using the code as the prompt word to generate the prompt text, in other words: let GPT summarize the manim code, the quality is better
Instead of relying on humans to write the prompt, we can also generate the prompt from the code itself via GPT models. This way we can have a more consistent dataset.
- Create a Python script to generate the prompt from the code available in the scripts of
/code
. - Create a JSONL file with the dataset generated.
Dataset from Edoh
We can also use the dataset from Edoh
to create a dataset of prompts and code.
- Create Python script to extract the dataset from
Edoh
dataset. - Create JSONL file with the dataset.
Extracting the dataset of prompts and code from the Edoh Manim Python dataset we got edoh-dataset.jsonl
.
The dataset contains 599 examples of prompts and code.
{"messages": [{"role": "system", "content": "Write Manim scripts for animations in Python. Generate code, not text."}, {"role": "user", "content": "Create a new scene named 'MyScene'."}, {"role": "assistant", "content": "from manim import * class MyScene(Scene): def construct(self): pass"}]}
{"messages": [{"role": "system", "content": "Write Manim scripts for animations in Python. Generate code, not text."}, {"role": "user", "content": "Add a circle with radius 2 and center at the origin to the scene."}, {"role": "assistant", "content": "from manim import * class MyScene(Scene): def construct(self): circle = Circle(radius=2, color=BLUE) self.add(circle)"}]}
{"messages": [{"role": "system", "content": "Write Manim scripts for animations in Python. Generate code, not text."}, {"role": "user", "content": "Create a text object saying 'Hello, World!' and add it to the scene."}, {"role": "assistant", "content": "from manim import * class MyScene(Scene): def construct(self): text = Text('Hello, World!') self.add(text)"}]}