GitHub - Gendo90/Crypto-Historical-Prices: An ETL Project for Cryptocurrency Price Data Compilation

·

Introduction

This ETL (Extract, Transform, Load) project consolidates historical cryptocurrency price data from multiple online sources into a unified SQL database. It provides:

The project uses Jupyter notebooks for data updates, ensuring scalability for additional cryptocurrencies.


Data Extraction

Primary Sources

  1. Kaggle Datasets:

  2. Coinbase Pro API:

    • Used to fetch recent data (2019–2020 for BTC; April–August 2020 for ETH).
    • API documentation for "candles" endpoint.

Limitations


Data Transformation

Standardized Schema

CSV files and SQL tables share these columns:

| Column Name | Data Type | Description |
|----------------------|--------------------|--------------------------------------|
| Unix Timestamp | int64 | Epoch time. |
| Date | datetime64[ns] | UTC-formatted timestamp. |
| Symbol | object | Cryptocurrency-fiat pair (e.g., BTC-USD). |
| Open/High/Low/Close | float64 | Price in fiat currency. |
| Volume | float64 | Number of coins traded. |

Cleaning Steps

👉 Explore advanced data cleaning techniques for your own projects.


Data Loading

PostgreSQL Setup

  1. Create Tables: Execute create_tables.sql in pgAdmin to generate:

    • bitcoin and ethereum tables.
    • Columns: unix_timestamp, entry_date, symbol, open_price, etc.
  2. Composite Primary Keys:

    • Combination of unix_timestamp + symbol ensures uniqueness across tables.

Loading Data

  1. Configure config.py with PostgreSQL credentials.
  2. Run the ETH & BTC Load Into SQL.ipynb notebook.
  3. Verify database tables populate correctly.

Data Updates

Steps

  1. Run Update_Data.ipynb to:

    • Sync CSV and SQL data.
    • Scrape new data via Coinbase API.
  2. Note:

    • Updates limited to 1-year increments per run.
    • Multiple runs may be needed for multi-year gaps.

FAQ

1. Can I add more cryptocurrencies?

Yes! The schema is scalable—simply standardize new data to match existing column formats.

2. Why use 1-minute granularity?

It allows deriving hourly/daily aggregates while retaining high precision.

3. How often should I update the data?

Monthly updates are recommended to maintain relevance.

👉 Learn about real-time crypto data strategies for trading or research.


Conclusion

This project simplifies access to highly granular crypto price data while emphasizing:

Next steps: Integrate additional APIs for extended historical data.