Like maximum initiatives, the workflow begins with information in CSV layout; on the other hand, inside to the app, I designed this system to learn and write DataFrames in pickle codecs. A big explanation why to make use of pickled DataFrames is that the serialized record keeps more than a few metadata — in case you set dtypes to strings, integers, or class, the ones dtypes are retained each time you learn within the information. On the opposite hand, with CSV information, it’s important to re-process the information again to an appropriate DataFrame.
Although pickle information are beautiful candy, all over the previous few steps of deployment with Streamlit, I ran right into a brick wall of an error with the pickle protocol.
Have this mistake?
Apparently, the Pandas to_pickle() way defaults to a protocol of model Five which isn’t universally supported. As a consequence, even if a typical Pandas pickled DataFrame might paintings in trying out in your native device, deployment to a server is some other tale.
WorthError: unsupported pickle protocol: 5
Traceback:File "/usr/native/lib/python3.7/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.__dict__)File "/app/app_courts/major.py", line 56, in <module>
run_app()File "/app/app_courts/major.py", line 52, in run_app
, classify=FalseFile "/app/app_courts/do_data/getter.py", line 124, in to_df
df = pd.read_pickle(trail)File "/house/appuser/.native/lib/python3.7/site-packages/pandas/io/pickle.py", line 182, in read_pickle
go back pickle.load(f)
Solution to Pickle Protocol 5 Error
When confronted with an error, I now and again pass with an alternative answer, i.e., one thing works simply as smartly to provide the similar consequence. However, I had little selection with the information record on account of more than a few constraints. While on the lookout for a solution, I came upon that the answer is reasonably easy — alternate the Pandas to_pickle() protocol from default to model 2. When Pandas pickle is blended with BZ2 compression, the result’s an ideal small, tremendous handy, and really appropriate information record.
# to steer clear of pickle protocol error
# alternate params from Five to twotrail = 'information/product_sales.bz2'df.to_pickle(trail, protocol=2)
When Pandas pickle is blended with BZ2 compression, the result’s an ideal small, tremendous handy, and really appropriate information record.