Introduction
This ETL (Extract, Transform, Load) project consolidates historical cryptocurrency price data from multiple online sources into a unified SQL database. It provides:
- 1-minute granularity price data for Bitcoin (BTC) and Ethereum (ETH).
Time ranges:
- Bitcoin: December 2014 – August 2020 (extendable).
- Ethereum: May 2016 – August 2020 (extendable).
The project uses Jupyter notebooks for data updates, ensuring scalability for additional cryptocurrencies.
Data Extraction
Primary Sources
Kaggle Datasets:
- Bitcoin historical data (
bitstampUSD_1-min_data_2012-01-01_to_2020-04-22.csv). - Ethereum historical dataset.
- Bitcoin historical data (
Coinbase Pro API:
- Used to fetch recent data (2019–2020 for BTC; April–August 2020 for ETH).
- API documentation for "candles" endpoint.
Limitations
- Pre-2014 BTC and pre-2016 ETH data unavailable at 1-minute granularity via free APIs.
Data Transformation
Standardized Schema
CSV files and SQL tables share these columns:
| Column Name | Data Type | Description |
|----------------------|--------------------|--------------------------------------|
| Unix Timestamp | int64 | Epoch time. |
| Date | datetime64[ns] | UTC-formatted timestamp. |
| Symbol | object | Cryptocurrency-fiat pair (e.g., BTC-USD). |
| Open/High/Low/Close | float64 | Price in fiat currency. |
| Volume | float64 | Number of coins traded. |
Cleaning Steps
- Dropped null values.
- Removed duplicate timestamps.
- Standardized date formats to match Coinbase API requirements.
👉 Explore advanced data cleaning techniques for your own projects.
Data Loading
PostgreSQL Setup
Create Tables: Execute
create_tables.sqlin pgAdmin to generate:bitcoinandethereumtables.- Columns:
unix_timestamp,entry_date,symbol,open_price, etc.
Composite Primary Keys:
- Combination of
unix_timestamp+symbolensures uniqueness across tables.
- Combination of
Loading Data
- Configure
config.pywith PostgreSQL credentials. - Run the ETH & BTC Load Into SQL.ipynb notebook.
- Verify database tables populate correctly.
Data Updates
Steps
Run Update_Data.ipynb to:
- Sync CSV and SQL data.
- Scrape new data via Coinbase API.
Note:
- Updates limited to 1-year increments per run.
- Multiple runs may be needed for multi-year gaps.
FAQ
1. Can I add more cryptocurrencies?
Yes! The schema is scalable—simply standardize new data to match existing column formats.
2. Why use 1-minute granularity?
It allows deriving hourly/daily aggregates while retaining high precision.
3. How often should I update the data?
Monthly updates are recommended to maintain relevance.
👉 Learn about real-time crypto data strategies for trading or research.
Conclusion
This project simplifies access to highly granular crypto price data while emphasizing:
- Reproducibility via Jupyter notebooks.
- Scalability for future coins.
- Maintenance ease with incremental updates.
Next steps: Integrate additional APIs for extended historical data.