Real-World Scenarios
Below are detailed "Jira Tickets" representing common tasks a Senior Python Developer might encounter. Each ticket includes the business requirement, technical constraints, and a proposed solution architecture.
The current ingestion script takes 4 hours to process a 5GB CSV file. We need to reduce this to under 30 minutes to meet SLA.
- Process 5GB CSV in < 30 mins.
- Handle memory constraints (max 8GB RAM).
- Upsert data into PostgreSQL.
👨💻 Technical Solution
Strategy: Use Pandas chunking and multiprocessing.
import pandas as pd
from multiprocessing import Pool
from sqlalchemy import create_engine
def process_chunk(chunk):
# Data cleaning logic here
chunk['processed_at'] = pd.Timestamp.now()
# Bulk insert using SQLAlchemy
engine = create_engine('postgresql://user:pass@localhost/db')
chunk.to_sql('table_name', engine, if_exists='append', index=False)
def main():
chunk_size = 100000
csv_file = 'large_data.csv'
# Read in chunks to avoid memory overflow
reader = pd.read_csv(csv_file, chunksize=chunk_size)
# Parallel processing
with Pool(4) as pool:
pool.map(process_chunk, reader)
if __name__ == '__main__':
main()
Why this works: Reading in chunks prevents loading the entire file into memory. Multiprocessing utilizes all CPU cores to process chunks in parallel.
The /upload endpoint blocks the main thread while resizing images, causing timeouts under load. We need to make this non-blocking.
👨💻 Technical Solution
Strategy: Offload heavy processing to a task queue (Celery) and use FastAPI for async handling.
# app.py (FastAPI)
from fastapi import FastAPI, UploadFile, BackgroundTasks
from tasks import resize_image_task
app = FastAPI()
@app.post("/upload")
async def upload_image(file: UploadFile, background_tasks: BackgroundTasks):
# Save temp file
temp_path = f"temp/{file.filename}"
with open(temp_path, "wb") as f:
f.write(await file.read())
# Offload to Celery or BackgroundTasks
background_tasks.add_task(resize_image_task, temp_path)
return {"status": "Processing started", "file": file.filename}
# tasks.py
def resize_image_task(path):
# CPU intensive work
pass
The background worker process consumes increasing memory over time until it crashes (OOM). Restarting fixes it temporarily.
👨💻 Technical Solution
Diagnosis: Likely circular references or global variables accumulating data.
Tools: tracemalloc and objgraph.
import tracemalloc
def start_monitoring():
tracemalloc.start()
def check_memory():
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("[ Top 10 ]")
for stat in top_stats[:10]:
print(stat)
# Call check_memory() periodically in the worker loop