IMDB is a fairly large database of movies.
IMDb database can be downloaded using a Python script named IMDbPY.py
UBUNTU AND DEBIAN.
Simple install on Debian / Ubuntu:
sudo apt-get install python-imdbpy
Download the imdby2sql script from here:
http://sourceforge.net/p/imdbpy/code/ci/default/tree/bin/imdbpy2sql.py
Download plain text data files from:
ftp://ftp.fu-berlin.de/pub/misc/movies/database/
Additional mirrors:
http://www.imdb.com/interfaces/
You can use wget by creating a new directory and running:
wget -r ftp://ftp.fu-berlin.de/pub/misc/movies/database/
Create tables and populate the database:
imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'mysql://root:root@localhost/imdb'
Additional documentation available here:
http://imdbpy.sourceforge.net/docs/README.sqldb.txt
CENTOS AND REDHAT.
Installing IMDbPy and SQLObject
1. Install required packages using yum:
2. Install SQLObject using Python EasyInstall:
3. Download IMDbPy from this page into the MySQL server, extract it and start the installation process:
Importing Data
1. Create a directory to dump all the data files that we will download:
2. Download only .gz file from the IMDb mirror site to /root/data :
3. Create a database in MySQL called ‘imdb’, with user ‘imdb’ and password ‘imdb’. We will then GRANT the user to the designated database:
3. Start the import process with -u and -d flag:
Take note that -d is the directory of the .gz dump files are located and -u is the connection string for our MySQL database server. You can change the connection string to any of SQLObject’s supported database such as PostgreSQL, SQLite, Firebird and MAX DB. Please refer to this documentation for details.
You will see similar output as below which indicates the importing process has started:
Wait up until it finish and you will have large sample data to play around in your MySQL server!
Comentarios
Publicar un comentario