|
1 | 1 | # Reading data
|
2 | 2 |
|
3 |
| -There are three main methods of getting data into a TensorFlow program: |
| 3 | +Note: The preferred way to feed data into a tensorflow program is using the |
| 4 | +@{$datasets$Datasets API}. |
| 5 | + |
| 6 | +There are three other methods of getting data into a TensorFlow program: |
4 | 7 |
|
5 | 8 | * Feeding: Python code provides the data when running each step.
|
6 | 9 | * Reading from files: an input pipeline reads the data from files
|
|
19 | 22 | Supply feed data through the `feed_dict` argument to a run() or eval() call
|
20 | 23 | that initiates computation.
|
21 | 24 |
|
| 25 | +Note: "Feeding" is the least efficient way to feed data into a tensorflow |
| 26 | +program and should only be used for small experiments and debugging. |
| 27 | + |
22 | 28 | ```python
|
23 | 29 | with tf.Session():
|
24 | 30 | input = tf.placeholder(tf.float32)
|
@@ -51,6 +57,9 @@ A typical pipeline for reading records from files has the following stages:
|
51 | 57 | 7. *Optional* preprocessing
|
52 | 58 | 8. Example queue
|
53 | 59 |
|
| 60 | +Note: This section discusses implementing input pipelines useing the |
| 61 | +queue-based APIs which can be cleanly replaced by the ${$datasets$Dataset API}. |
| 62 | + |
54 | 63 | ### Filenames, shuffling, and epoch limits
|
55 | 64 |
|
56 | 65 | For the list of filenames, use either a constant string Tensor (like
|
@@ -405,7 +414,8 @@ This is only used for small data sets that can be loaded entirely in memory.
|
405 | 414 | There are two approaches:
|
406 | 415 |
|
407 | 416 | * Store the data in a constant.
|
408 |
| -* Store the data in a variable, that you initialize and then never change. |
| 417 | +* Store the data in a variable, that you initialize (or assign to) and then |
| 418 | + never change. |
409 | 419 |
|
410 | 420 | Using a constant is a bit simpler, but uses more memory (since the constant is
|
411 | 421 | stored inline in the graph data structure, which may be duplicated a few times).
|
@@ -461,19 +471,31 @@ You can compare these with the `fully_connected_feed` and
|
461 | 471 | ## Multiple input pipelines
|
462 | 472 |
|
463 | 473 | Commonly you will want to train on one dataset and evaluate (or "eval") on
|
464 |
| -another. One way to do this is to actually have two separate processes: |
| 474 | +another. One way to do this is to actually have two separate graphs and |
| 475 | +sessions, maybe in separate processes: |
465 | 476 |
|
466 | 477 | * The training process reads training input data and periodically writes
|
467 | 478 | checkpoint files with all the trained variables.
|
468 | 479 | * The evaluation process restores the checkpoint files into an inference
|
469 | 480 | model that reads validation input data.
|
470 | 481 |
|
471 |
| -This is what is done in |
472 |
| -@{$deep_cnn#save-and-restore-checkpoints$the example CIFAR-10 model}. This has a couple of benefits: |
| 482 | +This is what is done @{tf.estimator$estimators} and manually in |
| 483 | +@{$deep_cnn#save-and-restore-checkpoints$the example CIFAR-10 model}. |
| 484 | +This has a couple of benefits: |
473 | 485 |
|
474 | 486 | * The eval is performed on a single snapshot of the trained variables.
|
475 | 487 | * You can perform the eval even after training has completed and exited.
|
476 | 488 |
|
477 | 489 | You can have the train and eval in the same graph in the same process, and share
|
478 |
| -their trained variables. See |
479 |
| -@{$variables$the shared variables tutorial}. |
| 490 | +their trained variables or layers. See @{$variables$the shared variables tutorial}. |
| 491 | + |
| 492 | +To support the single-graph approach |
| 493 | +@{$programmers_guide/datasets$Datasets} also supplies |
| 494 | +@{$programmers_guide/datasets#creating_an_iterator$advanced iterator types} that |
| 495 | +that allow the user to change the input pipeline without rebuilding the graph or |
| 496 | +session. |
| 497 | + |
| 498 | +Note: Regardless of the implementation, many |
| 499 | +operations (like ${tf.layers.batch_normalization}, and @{tf.layers.dropout}) |
| 500 | +need to know if they are in training or evaluation mode, and you must be |
| 501 | +careful to set this apropriately if you change the data source. |
0 commit comments